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(54) Coding/decoding apparatus, t»din5/dec»ding systemand multiplexed l>it stream 



(57) A coding apparatus of the present invention 
comprises coding circuit 1 for audio signals, coding cir- 
cuit 2 for video signals, interface circuit 3 on input of 
scene data, coding circuit 4 for scene data, composition 
circuit 5, multiplexing circuit 6, display circuit 7 and clock 
generating circuit 8. Each of coding circuits 1 , 2 and 4 
outputs time information representing a decoding tim- 



ing, and composition circuit 5 outputs time information 
representing a composition timing. Multiplexing circuit 6 
multiplexes time information together with the com- 
pressed data given from each of coding circuits 1 , 2 and 
A, thereby generating a bit stream. 
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Description 

[0001] The present invention relates to a cod- 
ing/decoding apparatus, a coding/decoding system and 
a multiplexed bit stream and particularly, to a system for s 
synchronously combining and reproducing natural pic- 
tures, voices, and computer graphics. 
[0002] MPEG (Motion Picture Coding Expert Group) 
has been known as an international standard for coding 
standardization for compressing, multiplexing and io 
transferring or storing audio signal (or voice signal), 
video signal, and artificial scene data such as computer 
graphic, and then separating and expanding the signals 
and data to obtain original signals. . The MPEG is 
defined by the working group (WG) 1 1 within SC29 is 
which are managed under JTC1 (Joint Technical Com- 
mittee 1 ) for handling common items in data processing 
fields of ISO (International Organization for Standardi- 
zation) and IEC (International Electrotechnical Commis- 
sion). In the MPEG, a mechanism for synchronously 20 
reproducing each media from multiplexed data is 
described. 

[0003] First, a mechanism for synchronously repro- 
ducing an audio signal and a video signal from multi- 
plexed data is described in ISO/IEC 13818-1 25 
"Information Technology Generic Coding of Moving Pic- 
tures and Associated Audio Systems" (popularly called 
MPEG-2 Systems). Fig. 53 of the accompanying draw- 
ings shows the construction of a fixed delay model used 
for the description. This figure shows an abstracted sys- 30 
tern architecture when MPEG-2 is applied to compress 
audio signals and video signals. 
[0004] In Fig. 53, encoder 71 compresses (encodes) 
audio signal, and encoder 72 compresses (encodes) 
video signal. Buffer 73 buffers the audio data com- 35 
pressed by the encoder 71, and buffer 74 buffers the 
video data thus compressed by the encoder 72. Multi- 
plexing circuit 75 multiplexes the compressed audio 
data stored in the buffer 73 and compressed video data 
stored in the buffer 74. At this time, a reference clock 40 
that is needed for synchronous reproduction and time 
stamps are embedded as additive information into the 
multiplexed data. 

[0005] Specifically, the time stamps are a decoding 
time stamp representing a decoding timing and a dis- 45 
play time stamp representing a display timing. The 
decoding time stamp is generally used only when inter- 
polative prediction is carried out. This is because when 
the interpolative prediction is carried out, the decoding 
timing and the display timing are different from each so 
other in some cases. In the other cases, the decoding 
time stamp is unnecessary. 

[0006] Storage/transmission device 76 stores or trans- 
mits the multiplexed data created by the multiplexing cir- 
cuit 75. Separation circuit (demultiplexing circuit) 77 55 
separates compressed audio data, compressed video 
data, and a reference clock and time stamp used for 
synchronous reproduction from the multiplexed data 



supplied from the storage/transmission device 76. 
Buffer 78 buffers the compressed audio data supplied 
from the separation circuit 77, and buffer 79 buffers the 
compressed video data supplied from the separation 
circuit 77. Decoder 80 decodes and reproduces the 
compressed audio data stored in the buffer 78, and 
decoder 81 decodes and displays the compressed 
video data stored in the buffer 79. 
[0007] The synchronous reproduction of the audio sig- 
nals and video signals in Fig. 53 is implemented as fol- 
lows. The reference clock embedded in the multiplexed 
data is used to control the oscillation frequency of a 
clock generating circuit for driving the decoder 80 and 
decoder 81, and PLL (Phased Locked Loop) is gener- 
ally used. The synchronization between the encoder 
side and the decoder side is established by the PLL. 
The time stamp embedded in the multiplexed data is 
used to transmit the decoding timing of the decoder 80 
and decoder 81 or the reproduction/display timing of the 
decoding result. The time axes of the encoder side and 
decoder side are synchronized with each other with a 
fixed delay being set therebetween by the reference 
clock, and the decoding operation is started at the time 
which is intended at the encoder side and the reproduc- 
tion/display is carried out. 

[0008] Accordingly, the synchronous reproduction of 
the audio signals and video signals can be implemented 
insofar as a suitable time stamp is set at the encoder 
side. In the case of an application in which synchronous 
reproduction isn't needed between the encoder side 
and the decoder side, the synchronous reproduction is 
carried out with the clock of the decoder itself without 
using the reference clock. 

[0009] Next, ISO/IEC JTC1/SC29/WG1 1 N1825 
"Working Draft 5.0 of ISO/IEC 14996-1" (popularly 
called MPEG-4 Systems) describes a mechanism for 
synchronously reproducing audio signals, video signals, 
and artificial scene data such as computer graphics 
from multiplexed data. 

[0010] Fig. 54 shows a system decoder model (SDM) 
used for the description of the above mechanism. This 
model is an abstracted system decoder when MPEG-4 
is applied to compress audio signals, video signals, and 
artificial scene data such as computer graphics. In this 
paper, detailed description isnl made on the model and 
concrete construction of the encoder, however, it is 
described as syntax that a reference clock and a time 
stamp are embedded as additive information in multi- 
plexed data. Specifically, there are provided two time 
stamps, a decoding time stamp representing a decod- 
ing timing and a composite time stamp representing a 
timing at which decoding data can be supplied to a com- 
position circuit. 

[0011] In Fig. 54, a separation circuit 91 separates 
from the multiplexed data compressed audio data, com- 
pressed video data, compressed scene data, and a ref- 
erence dock and a time stamp used for synchronous 
reproduction. Buffer 92 buffers the compressed audio 



2 



BNSDOC ID: <E P 0924934A 1 J _ > 



3 



EP 0 924 934 A1 



4 



data supplied from the separation circuit 91 , and buffer 
93 buffers the compressed video data supplied from the 
separation circuit 91. Buffer 94 buffers 1he compressed 
artificial scene data supplied from the separation circuit 
91 . Decoder 95 decodes the compressed audio data 
stored in the buffer 92, decoder 96 decodes the com- 
pressed video data stored in the buffer 93, and decoder 
97 decodes the compressed artificial scene data stored 
in the buffer 94. 

1001 2] Buffer 98 buffers the audio signal decoded by 
the decoder 95, buffer 99 buffers the video signal 
decoded bylhe decoder 96, and buffer 100 buffers the 
artificial scene data decoded by the decoder 97. Com- 
position circuit 101 composes a scene on the basis of 
the audio signal stored in the buffer 98, the video signal 
stored in the buffer 99 and the artificial scene data 
stored in the buffer 100. At this time, the scene informa- 
tionihat is composed is described in the artificial scene 
datai, and in accordance with the scene information the 
audio signal is modulated or the video signal is 
deformed, andlhe signal is mapped to an object in the 
scene. Display circuit 102 reproduces/displays a scene 
supplied from the composition circuit 101 . 
10013] The composite and reproduction of 1he audio 
signal, the video signal and the artificial scene data in 
Fig. 54 is implemented as follows: 
[0014] The reference clock can be provided every 
decoder. After it is picked up from the multiplexed data, 
it is input to a clock generating circuit which is provided 
-every decoder in order to control the oscillation fre- 
quency of the clock generating circuit, whereby the syn- 
chronization between the encoder side and the decoder 
side can be established every decoder. The time stamp 
can be also provided every decoder. After it is picked up 
from the multiplexed data, it is used to transmit the time 
at which the decoding timing of the decoder or the 
decoding result can be supplied to the composition cir- 
cuit 101. The time axes of the encoder side and the 
decoder side are synchronized with each other with a 
fixed delay being set therebetween by the reference 
clock, and the decoding is started at the time intended 
by the encoder side and the writing operation into the 
buffer is carried out. 

[0015] Subsequently, the composition circuit 101 
takes out the audio signal, the video signal and the arti- 
ficial scene data held in each buffer to perform scene 
composition. The times at which the audio signal, the 
video signal and the scene data are obtained by the 
composition circuit 101 are respectively given on the 
basis of the composite time stamps added to these sig- 
nals and data. However, the timing for composing a 
scene is unclear, and the composition circuit 101 itself is 
set to start a event processing in accordance with a dis- 
crete time event described in the scene data. Finally, the 
display circuit 102 reproduces and displays the scene 
supplied from the composition circuit 1 01 . 
[0016] Further, as representative one of artificial 
scene data, VRML (Virtual Reality Modeling Language) 



has been known as a description format to describe 
computer graphics, transmit or store 1he data thus 
described, build and ^hare a virtual three-dimensional 
space on the of the data: VRML is defined as interna- 
tional standards by SC24 managed under JTC1 (Joint 
Technical Committee 1) for handling common items in 
the data processing fields of ISO (International Organi- 
zation for Standardization) and IEC (International Elec- 
trotechnical Commission) and a VRML consortium to 
which associated companies pertain in cooperation with 
each other. In this VRML, a description method of taking 
an audio signal and a video signal into a scene is further 
described. 

[0017] The details of the description method are 
described in ISO/IEC DIS 14772-1 The virtual Reality 
Modeling Language (popularly called VRML97). IN the 
ISO/IEC DIS 14772-1, not only computer graphics, but 
also ISO/IEC 1 1 172 (popularly called MPEG^I ) which is 
one of the MPEG standards are contained as support 
targets. MPEG-1 is one of coding international stand- 
ards for audio signals and video signals. Specifically, 
the audio signals andlhe video signals are mapped as 
a sound source «nd as a moving picture texture ior<a 
three-dimensional object respectively in a three-dimen- 
sional scene constructed by VRML Further, ihe 
description of a time event is supported on VRML, and 
a time event occurs according to a time stamp 
described in the VRML format. 
[0018] The time event is further classified into two 
types of a continuous time event and a discrete time 
event The continuous time event is an event in which 
the action of an animation or the like is continuous on 
time axis, and the discrete time event is an event in 
which an object in a scene starts after a time elapses. 
[0019] Fig. 55 shows the construction of a decoding 
processing system for receiving the VRML format and 
constructs a three-dimensional scene (called as 
"Browser in VRML). Buffer 1 1 1 receives through the 
internet muftiplexed data compressed by MPEG-1 and 
buffers the data received. Buffer 112 receives through 
the internet the VRML format or the compressed VRML 
format and buffers the format received. At this time,ihe 
original place of the VRML format may be different from 
that of the MPEG- 1 data. 

[0020] Separation circuit 113 separates compressed 
audio data and compressed video data from the MPEG- 
1 multiplexed data supplied from the buffer 111. 
Decoder 114 decodes the compressed audio data sup- 
plied from the separation circuit 113, and decoder 115 
decodes the compressed video data supplied from the 
separation circuit 114. Decoder 1 16 decodes the com- 
pressed VRML format stored in the buffer 112. When 
the VRML format is not compressed, no action is taken. 
Memory 117 stores the audio signal decoded by the 
decoder 114, and memory 118 stores the video signal 
decoded by the decoder 115. Memory 119 stores the 
VRML format decoded by the decoder 1 16. 
[0021] Composition circuit 120 synthesizes a scene 
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on the basis of the audio signal stored in the memory 
117, the video signal stored in the memory 1 18 and the 
artificial scene data stored in the memory 119. In this 
case, scene information to be composed is described in 
the artificial scene data. According to the scene infor- 5 
mation, the audio signal is modulated and the video sig- 
nal is deformed, and then these signals are mapped into 
an object in the scene. Display circuit 121 repro- 
duces/displays the scene supplied from the composition 
circuit 120. w 

[0022] The composite of the audio signal, the video 
signal and the VRML format in Fig. 55 and the reproduc- 
tion thereof are implemented as follows: 
[0023] After the loading of the MPEG-1 multiplexed 
data from the external to the buffer 1 1 1 is terminated, is 
the decoder 114 decodes the compressed audio data 
and the decoder 115 decodes the compressed video 
data, and the audio signal and the video signal obtained 
through the above decoding operation are written into 
the memory 1 1 7 and the memory 1 18 respectively. Fur- 20 
ther, after the loading of the VRML format from the 
external to the buffer 1 1 2 is terminated, the decoder 1 1 6 
decodes the VRML format when the VRML format is 
compressed or takes no action when the VRML format 
is not compressed, and then writes the VRML format 25 
thus obtained into the memory 119. After the above 
processing is terminated, that is, the processing of a 
part surrounded by a dotted line indicated by reference 
numeral 222 is terminated, the composition circuit 120 
and the display circuit 121 start operating to perform 30 
composite (mixing), reproduction and display. 
[0024] On the other hand, when it is intended that only 
the video signal and the computer graphics are com- 
bined with each other, a chromakey system which has 
been already used for the weather forecast in the 35 
present broadcasting system has been known. Accord- 
ing to the chromakey system, a person or an object is 
disposed under the background whose color is speci- 
fied to a single color such as blue color or the like to 
shoot an overall picture, and then the background- 40 
colored portion is deleted from the picture, whereby only 
the person or the object in front of the background can 
be picked up. 

[0025] Fig. 56 shows the construction of a coding 
processing system for creating a composite picture of 45 
the video signal and the computer graphics by using the 
chromakey system, and compressing and multiplexing 
the composite picture and the audio signal. Chromakey 
processing circuit 1 31 deletes from an input video signal 
a portion having the color coincident with the back- 50 
ground color. Composition circuit 132 creates a compu- 
ter graphics image from artificial scene data given. 
Memory 133 stores a cut-out picture supplied from the 
chromakey processing circuit 131. In this case, memory 
133 may store directly the picture data and inform 55 
merely a subsequent-stage convolution circuit 135 that 
the RGB value corresponding to the background color is 
deleted. Memory 134 stores the computer graphics pic- 



ture generated by the composition circuit 132. The con- 
volution circuit 135 overwrites the cut-out picture 
obtained from the memory 1 33 on the computer graph- 
ics image obtained from the memory 1 34. It may be also 
allowed to detect the RGB value corresponding to the 
background color and replace only pixels located within 
a specified range by a computer graphics image. 
[0026] Encoder 136 compresses (encodes) the audio 
signal. Encoder 137 compresses the composite picture 
obtained from the convolution circuit 135. Buffer 138 
buffers the audio data compressed by the encoder 136, 
and buffer 139 buffers the composite picture data com- 
pressed by the encoder 137. Multiplexing circuit 140 
multiplexes the compressed audio data stored in the 
buffer 138 and the compressed composite picture data 
stored in the buffer 139. At this time, the reference clock 
which is necessary for the synchronous reproduction 
and the time stamp are embedded as additive informa- 
tion into the multiplexed data. 

[0027] The creation of the composite picture of the 
video signal and computer graphics is performed in the 
portion surrounded by a dotted line indicated by refer- 
ence numeral 141 . The other portions correspond to the 
coding portion of the coding/decoding system shown in 
Fig.53. That is. the video signal and the computer 
graphics are first combined with each other to obtain a 
composite picture, and then the composite picture and 
the audio signal are compressed and multiplexed. The 
construction of the decoding side is the same as that of 
Fig. 53. 

[0028] The coding/decoding synchronous reproduc- 
tion system of the audio signal and the video signal 
shown in Fig. 53 relates to the coding, multiplexing, sep- 
arating and decoding for the audio signal and the video 
signal, and no description is made on the processing of 
artificial scene data such as computer graphics. 
[0029] Further, in the decoding synchronous repro- 
duction system of the audio signal, the video signal and 
the artificial scene data shown in Fig. 54, the decoding 
timing and the timing at which each data may be sup- 
plied to the composition circuit are given. However, the 
timing at which all the data are composed and the timing 
at which the composite picture is displayed are not 
specified. In other words, the composition circuit is set 
to start its composite operation freely. Further, it is sug- 
gested that the composition (mixing) is started in 
accordance with a discrete time event described in the 
artificial scene data. 

[0030] However, the artificial scene data suffers a 
buffer delay in the decoding operation, and thus a 
desired time may have passed at the time when the arti- 
ficial scene data are supplied to the composition circuit 
101 . Therefore, the artificial scene data itself cannot be 
used to give an accurate timing for composing. Further, 
when a continuous time event is described in the artifi- 
cial scene data, the composition start time is different 
between the coding side and the decoding side in some 
cases, Therefore, occurrence of an accurately coinci- 
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dent continuous time event cannot be ensured. Particu- 
larly, in the case of animation or the like for which motion 
is required to be continuously represented, the position 
of a moving object is displaced between the coding side 
and the decoding side. Due* to the above problem, a 
composite picture desired by the coding side cannot be 
composed while it is accurately coincident at the decod- 
ing side. 

[0031 ] Further, the decoding and reproducing system 
of the audio signal, the video signal and the artificial 
scene data shown in Fig. 55 does not support stream 
data which are transmitted continuously on time axis. 
That is. the processing of a portion 122 surrounded by a 
dotted line must be finished before the reproduction is 
started. 

[0032] Still further, in the coding/decoding synchro- 
nous reproducing system of the audio signal, the video 
signal and the artificial scene data shown in Fig. 56, the 
composite picture is degenerated into a mere two- 
dimensional picture at the coding side, and thus an 
interaction function which would be obtained by using 
the artificial scene data is lost That is, there is a disad- 
vantage that additive functions such as movement of a 
visual point in the three-dimensional space, and naviga- 
tion cannot be implemented. 

[0033] An object of the present invention is to provide 
a coding apparatus, a decoding apparatus, a cod- 
ing/decoding system and a multiplexed bit stream which 
implements coding/decoding synchronous reproduction 
of an audio signal, a video signal and artificial scene 
data while excluding the disadvantage of the conven- 
tional systems described above, ensuring generation of 
a composite picture desired at the coding side, support- 
ing stream data transmitted continuously on time axis, 
and supporting the interaction function in the decoding 
side. 

[0034] A coding apparatus according to the present 
invention comprises: audio signal coding means for 
coding an audio signal; video signal coding means for 
coding a video signal; interface means for accepting 
information on a composite scene; scene data coding 
means for coding scene data supplied from the interface 
means; composition means for composing a scene from 
the audio signal supplied from the audio signal coding 
means, the video signal supplied from the video signal 
coding means anpJ the composite scene data supplied 
from the scene data coding means; display means for 
reproducing/displaying the composite picture signal and 
the audio signal supplied from the composition means; 
clock supply means for supplying clocks to the audio 
signal coding means, the video signal coding means, 
the scene data coding means and the composition 
means; and multiplexing means for creating a bit stream 
on the basis of the time information and compressed 
audio data supplied from the audio signal coding 
means, the time information and compressed video 
data supplied from the video signal coding means, the 
time information and compressed scene data supplied 



from the scene data coding means, the time information 
supplied from the composition means and the clock 
value supplied from the clock supplying means. 
[0035] According to the present invention, the coding 

s apparatus further comprises means for detecting the 
status of the composition means and controlling the 
operation of the coding means of the video signal. 
[0036] According to the present invention, the coding 
apparatus further comprises means for detecting the 

10 status of the coding means for the audio signal, the sta- 
tus of the coding means for the video signal and the sta- 
tus of the coding means for the scene data, and 
controlling the operation of the composition means. 
[0037] According to the coding apparatus of the 

15 present invention, the clock supply means includes first 
dock supply means for supplying clocks to the audio 
signal coding means, second clock supply means for 
supplying clocks to the video signal coding means and 
third clock supply means for supplying clocks to the 

20 scene data coding means and composition means, and 
the multiplexing means multiplexes the clock values 
supplied from the first, second, and third clock supply 
means respectively. 

[0038] According to the coding apparatus of the 
25 present invention, the clock supply means includes first 
clock supply means for supplying clocks to the audio 
signal coding means, second clock supply means for 
supplying clocks to the video signal coding means and 
composition means, and third clock supply means for 
30 supplying clocks to the scene data coding means, and 
the multiplexing means multiplexes the clock values 
supplied from 1he first, second, and third clock supply 
means respectively. 

[0039] A decoding apparatus according to the present 

35 invention comprises: means for separating both of com- 
pressed data and time information of an audio signal, 
both of compressed data and time information of a 
video signal, both of compressed data and time infor- 
mation of scene data, time information of scene compo- 

40 sition and dock information from a bit stream; means for 
decoding the audio signal on the basis of the com- 
pressed data and time information of the audio signal; 
means for decoding the video signal on the basis of the 
compressed data and time information of the video sig- 

45 nal; means for decoding the scene data on the basis of 
the compressed data and time information of the scene 
data; means for composing a scene on the basis of the 
time information for the scene composition supplied 
from the separation means, the audio signal supplied 

so from the decoding means for the audio signal, the video 
signal supplied from the decoding means for the video 
signal and the scene data supplied from the decoding 
means for the scene data; means for generating docks 
according to the clock value supplied from the separat- 

55 ing means and supplying the clocks to the decoding 
means for the audio signal, the decoding means for the 
video signal, the decoding means for the scene data 
and the composition means; means for reproducing/dis- 
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playing the composite picture signal and the audio sig- 
nal supplied from the composition means; and interface 
means for accepting an interaction from a viewer to the 
composite picture. 

[0040] According to a first embodiment of the decod- s 
ing apparatus, the separation means separates a plural- 
ity of independent clock values from the bit stream, and 
the independent clock values are input to means for 
supplying the clocks to the decoding means for the 
audio signal, means for supplying the clocks to the 10 
decoding means for the video signal, and means for 
supplying the clocks to the decoding means for the 
scene data and the composition means. 
[0041] According to a second embodiment of the 
decoding apparatus, the separation means separates a 15 
plurality of independent clock values from the bit 
stream, and the independent clock values are input to 
means for supplying the clocks to the decoding means 
for the audio signal, means for supplying the clocks to 
the decoding means for the video signal and the compo- 20 
sition means, and means for supplying the clocks to the 
decoding means for the scene data. 
[0042] A multiplexed bit stream according to the 
present invention comprises an audio signal, a video 
signal and scene data, characterized in that a flag rep- 25 
resenting whether time information representing a 
decoding timing doubles as time information represent- 
ing a composition timing is added to said time informa- 
tion. 

30 

Fig. 1 is a block diagram showing a first embodi- 
ment of a coding apparatus according to the 
present invention; 

Fig. 2 is a block diagram showing the construction 
of a coding circuit of Fig. 1 ; 35 
Fig. 3 is a first block diagram showing the construc- 
tion of a composition circuit of Frg. 1 ; 
Fig. 4 is a block diagram showing the construction 
of a multiplexed circuit of Fig. 1 ; 
Fig. 5 is a block diagram showing a second embod- 40 
iment of the coding apparatus according to the 
present invention; 

Fig. 6 is a block diagram showing the construction 
of a coding circuit of Fig. 5; 

Fig. 7 is a first block diagram showing the construe- 45 
tion of a composition circuit of Fig. 5; 
Fig. 8 is a block diagram showing a third embodi- 
ment of the coding apparatus according to the 
present invention; 

Fig. 9 is a block diagram showing the construction so 
of a coding circuit of Fig. 8; 

Fig. 10 is a first block diagram showing the con- 
struction of a composition circuit of Fig. 8; 
Fig. 1 1 is a block diagram showing a fourth embod- 
iment of the coding apparatus according to the 55 
present invention; 

Fig. 12 is a block diagram showing the construction 
of a multiplexed circuit of Fig. 11; 



Fig. 13 is a block diagram showing a fifth embodi- 
ment of the coding apparatus according to the 
present invention; 

Fig. 14 is a block diagram showing a sixth embodi- 
ment of the coding apparatus according to the 
present invention; 

Fig. 15 is a block diagram showing a seventh 
embodiment of the coding apparatus according to 
the present invention; 

Fig. 16 is a block diagram showing an eighth 
embodiment of the coding apparatus according to 
the present invention; 

Fig. 17 is a block diagram showing a ninth embodi- 
ment of the coding apparatus according to the 
present invention; 

Fig. 18 is a block diagram showing a first embodi- 
ment of a decoding apparatus according to the 
present invention; 

Fig. 19 is a block diagram showing the construction 
of a separation circuit of Fig. 18; 
Fig. 20 is a block diagram showing the construction 
of a decoding circuit of Fig. 18; 
Rg. 21 is a first block diagram showing the con- 
struction of a composition circuit of Fig. 18; 
Fig. 22 is a block diagram showing a second 
embodiment of the decoding apparatus according 
to the present invention; 

Rg. 23 is a block diagram showing the construction 
of a separation circuit of Fig. 22; 
Fig. 24 is a block diagram showing a third embodi- 
ment of the decoding apparatus according to the 
present invention; 

Fig. 25 is a block diagram showing a coding/decod- 
ing system according to the present invention; 
Fig. 26 is a diagram showing a bit stream generated 
by the coding apparatus according to the first 
embodiment of the present invention; 
Rg. 27 is a diagram showing a bit stream generated 
by the coding apparatus according to the fourth 
embodiment of the present invention; 
Rg. 28 is a time chart for normal coding, decoding 
and composition; 

Fig. 29 is a time chart for coding, decoding and 
composition when excessive time is needed for 
composition; 

Rg. 30 is a time chart for coding, decoding and 
composition, which is solved by the coding appara- 
tus of the second embodiment of the present inven- 
tion; 

Rg. 31 is a time chart for normal coding, decoding 

and composition in the case of plural inputs; 

Rg. 32 is a first time chart for coding, decoding and 

composition when excessive time is needed for 5 

composition in the case of plural inputs; 

Fig. 33 is a first time chart for coding, decoding and 

composition in the case of plural inputs, which is 

solved by the coding apparatus of the second 

embodiment of the present invention; 
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Fig. 34 is a second time chart for coding, decoding 
and composition when excessive time is needed for 
composition in the case of plural inputs; 
Fig. 35 is a second time chart for coding, decoding 
and composition in the case of plural inputs, which 5 
is solved by the coding apparatus of the second 
embodiment of the present invention; 
Fig. 36 is a time chart for coding, decoding and 
composition, which is solved by the coding appara- 
tus of the third embodiment of the present inven- 10 
tion; 

Fig. 37 is a time chart for coding, decoding and 
composition in the case of plural inputs, which is 
solved by the coding apparatus of the third embod- 
iment of the present invention; 75 
Fig. 38 is a diagram showing data flow among a 
buffer in a decoding circuit, a memory in the decod- 
ing circuit and a composition circuit; 
Fig. 39 is a time chart for normal decoding and 
composition; 20 
Fig. 40 is a time chart for decoding and composition 
when excessive time is needed for composition; 
Fig. 41 is a time chart for decoding and composi- 
tion, which is solved by the decoding apparatus of 
the first embodiment of the present invention; 25 
Fig. 42 is a time chart for normal decoding and 
composition in the case of plural inputs; 
Fig. 43 is a time chart for decoding and composition 
when excessive time is needed for composition in 
the case of plural inputs; 30 
Fig. 44 is a time chart for decoding and composition 
in the case of plural inputs, which is solved by the 
decoding apparatus of the first embodiment of the 
present invention; 

Fig. 45 is a second block showing the construction 35 
of the composition circuit of Fig. 1 ; 
Fig. 46 is a second block diagram showing the con- 
struction of the composition circuit of Fig. 5; 
Fig. 47 is a second block diagram showing the con- 
struction of the composition circuit of Fig. 8; 40 
Fig. 48 is a second block diagram showing the con- 
struction of the composition circuit of Fig. 18; 
Fig. 49 is a diagram showing another example of. a 
bit stream generated by the coding apparatus of the 
first embodiment of the present invention; 45 
Fig. 50 is a diagram showing another example of a 
bit stream generated by the coding apparatus of the 
fourth embodiment of the present invention; 
Fig. 51 is a block diagram showing a tenth embodi- 
ment of the coding apparatus of the present inven- so 
tion; 

Fig. 52 is a block diagram showing the fourth 
embodiment of the decoding apparatus of the 
present invention; 

Fig. 53 is a diagram showing a conventional cod- 55 
ing/decoding synchronous reproducing system for 
audio signals and video signals; 
Fig. 54 is a conventional decoding synchronous 



reproducing system for audio signals, video signals 
and artificial scene data; 

Fig. 55 is a diagram showing a conventional decod- 
ing reproducing system for audio signals, video sig- 
nals and artif icial scene data; and 
Fig. 56 is a diagram showing a conventional cod- 
ing/decoding synchronous reproducing system for 
audio signals, video signals and artificial scene 
data. 

[0043] Preferred embodiments according to the 
present invention will be described hereunder with 
respect to the accompanying drawings. 
[0044] Fig. 1 is a block diagram showing a first embod- 
iment of a coding apparatus according to the present 
invention. The coding apparatus shown in Fig. 1 com- 
prises a coding circuit 1 for audio signals (hereinafter 
referred to as "audio coding circuit"), a coding circuit 2 
for video signals (hereinafter referred to as "video cod- 
ing circuit"), an interface circuit 3 for input of scene data, 
a coding circuit 4 for scene data (hereinafter referred to 
as "scene coding circuit"), a composition circuit 5, a 
multiplexing circuit 6, a display circuit 7 and a clock gen- 
erating circuit 8. 

[0045] The audio coding circuit 1 compresses an 
audio signal input thereto, and outputs the compressed 
data, a time stamp representing a decoding timing and 
audio data which is locally decoded. The video coding 
circuit 2 compresses a video signal input thereto, and 
outputs the compressed data, a timestamp representing 
a decoding timing and video data which are locally 
decoded. In place of the video signal, text data, graphics 
data or the like may be coded in some cases. 
[0046] The interface circuit 3 for the input of the scene 
data accepts description, update on composite scenes 
from a transmitter, and outputs it as scene data. A key- 
board input, a mouse input or the like may be used as 
the interface. The scene coding circuit 4 receives the 
scene data from the interface circuit 3, and outputs the 
compressed data of the scene data, a timestamp repre- 
senting a decoding timing and scene data which are 
locally decoded. The time stamp generated in each cod- 
ing circuit may be the same as ISO/lEC 
JTC1/SC29/WG11 N1825 described in the above- 
described conventional technique, and a decoding time 
stamp and a composite time stamp are used. 
[0047] The decoding time stamp is used for only an 
interpolate predicted picture, and only the composite 
time stamp is used for video, audio and scene data of 
the other prediction modes. That is, the decoding timing 
and the timing at which the decoding data is allowed to 
be used by the composition circuit 5 are assumed to be 
equal to each other. However, it is important that a fixed 
delay is-set between the coding apparatus or a storage 
medium and the decoding apparatus, and the decoding 
of the decoding apparatus may be terminated after a 
fixed time elapses from the time represented by the time 
stamp. 
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[0048] The composition circuit 5 receives the audio 
signal output from the audio coding circuit 1 , the video 
signal output from the video coding circuit 2 and the 
scene data output from the scene coding circuit 4 to 
compose a scene according to a scene description 5 
described in the scene data, and outputs a composite 
picture, the audio signal and the time stamp represent- 
ing the composition timing. This time stamp is not 
shown in ISO/I EC JTC 1 /SC29/WG 1 1 N1825. and in this 
specification, it is called as "display time stamp". That is, w 
the composition timing and the display timing are 
assumed to be equal to each other. However, it is impor- 
tant that a fixed delay is set between the coding appara- 
tus or the storage medium and the decoding apparatus, 
and the composition of the decoding apparatus may be 15 
terminated after a fixed time elapses from the time rep- 
resented by the time stamp. 

[0049] The multiplexing circuit 6 receives both of the 
compressed data and the time stamp representing the 
decoding timing which are output from the audio coding 20 
circuit 1 , both of the compressed data and the time 
stamp representing the decoding timing which are out- 
put from the video coding circuit 2, both of the com- 
pressed data and the time stamp representing the 
decoding timing which are output from the scene coding 25 
- circuit 4, the time stamp representing the composition 
timing which is output from the composition circuit 5, 
and clocks supplied from a clock generating circuit 8 
described later, and generates and outputs a bit stream. 
[0050] The display circuit 7 receives the composite 30 
picture signal and the audio signal which are output 
from the composition circuit 5, and display/reproduces 
the data through a display for video data and through a 
speaker or the like for audio data. The clock generating 
circuit 8 generates clocks as clock inputs (CLK) to the 35 
audio coding circuit 1 , the video coding circuit 2, the 
scene coding circuit 4, the composition circuit 5. and the 
multiplexing circuit 6. 

[0051 ] Fig. 2 shows the construction of the audio cod- 
ing circuit 1, the video coding circuit 2 and the scene 40 
coding circuit 4. The input signals to the respective cod- 
ing circuits are different from one another, however, the 
respective coding circuits have the functionally common 
structure which comprises encoder 11, decoder 12, 
memory 13, buffer 14 and buffer 15. The encoder 11 45 
receives the input signal and locally decoded data sup- 
plied from the memory 13 (described later) and outputs 
the compressed data. Further, it outputs the time stamp 
representing the decoding timing. For example, it out- 
puts the time at which the coding is finished. The 50 
decoder 12 receives the compressed data output from 
the encoder 11 and the locally decoded data supplied 
from the memory 13 and outputs new locally decoded 
data. The memory 13 stores the locally decoded data 
supplied from the decoder 12, and outputs the data to 55 
the encoder 1 1 and the composition circuit 5. The buffer 
14 buffers the time stamp representing the decoding 
timing supplied from the encoder 1 1 , and outputs rt to 



the multiplexing circuit 6. The buffer 15 buffers the com- 
pressed data output from the encoder 11, and oulputs 
the data to the multiplexing circuit 6. Further, clocks are 
supplied from the clock generating circuit 8, and these 
clocks are set as clock inputs (CLK) to the encoder 1 1 
and the decoder 12. 

[0052] In Fig. 2, the locally decoded data stored in the 
memory 13 are used as an input to the encoder 11 and 
the decoder 12 for a subsequent coding process. How- 
ever, these data may not be used for the subsequent 
coding process in such a case as coding of a still pic- 
ture. 

[0053] Fig. 3 shows the construction of the composi- 
tion circuit 5 of Fig. 1. The composition circuit 5 com- 
prises scene generating circuit 201, buffer 202, 
conversion processing circuit 203, texture generating 
circuit 204, raster circuit 205, delay circuit 206 and 
frame buffer 207. 

[0054] The scene generating circuit 201 receives the 
scene data from the scene coding circuit 4 to generate 
a scene graph, and outputs a scene drawing command 
and intermediate data together with a time stamp repre- 
senting the composition timing. In the case of a two- 
dimensional scene, coordinate data, graphics data, text 
data are generated at every object in a scene. Further, 
the fore-and-aft relationship of respective objects is 
added. In the case of a three-dimensional scene, setting 
of a camera, setting of the angle of field of view, setting 
of alight source, deletion of objects out of the visual field 
are further performed. The buffer 202 buffers the time 
stamp representing the composition timing which is 
supplied from the scene generating circuit 201 . 
[0055] The conversion processing circuit 203 receives 
a scene drawing command and intermediate data sup- 
plied from the scene generating circuit 201 to execute 
conversion processing such as coordinate transforma- 
tion, light-source calculation, clipping and outputs new 
intermediate data. Further, it receives a texture from a 
texture generating circuit 204 described later, and maps 
it into an object in a scene. In the case of the two-dimen- 
sional scene, movement, rotation, enlargement, reduc- 
tion of object, and other processing are carried out. In 
the case of the three-dimensional scene, the effect of 
the light source, and hidden surface algorithm in depth 
direction are further carried out. Through the above 
processing, the position information and the color infor- 
mation of each object in a scene that is viewed from a 
current visual point are determined and output. 
[0056] The texture generating circuit 204 receives the 
video data supplied from the video coding circuit 2, the 
drawing command supplied from the scene generating 
circuit 201 and the coordinate information supplied from 
the conversion processing circuit 203, deforms into a 
texture the video data which are mapped into an object 
in a scene, and then outputs the texture thus obtained. 
The present invention is based on the assumption that 
the scene composition is repeated every frame, and 
thus it is general that the video data corresponds to one 
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picture. 

[0057] The raster circuit 205 receives the intermediate 
data from the conversion processing circuit 203 to con- 
vert the intermediate data to raster data on a pixel basis. 
The delay circuit 206 receives the audio data from the s 
audio coding circuit 1 to delay the audio data in consid- 
eration of the time lapse of the processing executed 
from the scene generating circuit 201 to the raster cir- 
cuit 205, and outputs the audio data thus delayed to the 
display circuit 7. The frame buffer 207 stocks the raster 10 
data supplied from the raster circuit 205, and outputs 
the raster data thus stored to the display circuit 7. The 
scene generating circuit 201 , the conversion processing 
circuit 203, the texture generating. circuit 204 and the 
raster circuit 205 are supplied with the clocks (CLK) is 
from the clock generating circuit 8. 
[0058] Fig. 45 shows another embodiment of the com- 
position circuit 5 of Fig. 1 , and the composition circuit 5 
comprises interface circuit 21, central processing unit 
(CPU) 22, conversion processing circuit 23, raster cir- 20 
cuit 24, texture generating circuit 25, frame buffer 26, 
delay circuit 27, counter 28 and memory 29. The 
respective circuits are connected to one another 
through a bus. 

[0059] The interface circuit 21 receives the audio data 25 
supplied from the audio coding circuit 1 , the video data 
supplied from the video coding circuit 2 and the scene 
data supplied from the scene coding circuit 4 and out- 
puts the time stamp representing the composition timing 
described later to the multiplexing circuit 6. That is, it 30 
serves as an interface between each circuit connected 
to the bus and the external. 

[0060] CPU 22 performs various software processing 
such as initial-stage processing needed for scene com- 
position, generation of a scene graph on the basis of the 35 
scene data supplied from the scene coding circuit 4, 
allocation of an operation to each circuit on the basis of 
analysis of the scene graph, a schedule management of 
each more general circuit resource. Further, it outputs 
the time stamp representing the composition timing to 40 
the interface circuit 21, and performs an emulation of 
operation frequency control by using a clock calculation 
value given from the counter 28 described later. 
[0061] The conversion processing circuit 23 performs 
the same processing as the conversion processing cir- 45 
cuit 203 shown in Fig. 3 in response to the drawing com- 
mand from the CPU 22. The raster circuit 24 performs 
the same processing as the raster circuit 205 of Fig. 3 in 
response to the drawing command from the CPU 22. 
The raster data thus finally obtained are written into the so 
frame buffer 26 described later. The texture generating 
circuit 25 performs the same processing as the texture 
generating circuit 204 of Fig. 3 in response to the draw- 
ing command from the CPU 22. The frame buffer 26 
stores the raster data obtained from the raster circuit 24 55 
and outputs the data thus stored to the display circuit 7. 
The delay circuit 27 delays the audio signal from the 
audio coding circuit 1 in consideration of the calculation 



time for a series of composition processing, and outputs 
the audio signal thus delayed to the display circuit 7. 
The counter 28 counts the number of clocks supplied 
from the clock generating circuit 8, and outputs the 
count number to the CPU 22 as occasion demands. . 
[0062] In this case, the operation frequency of the 
CPU 22, the conversion processing circuit 23, the raster 
circuit 24 and the texture circuit 25 is given from another 
clock generating circuit. However, the clocks supplied 
from the clock generating circuit 8 may be used. The 
memory 29 is used to store control data and intermedi- 
ate data needed for the calculation in each of the CPU 
22, the conversion processing circuit 23, the raster cir- 
cuit 24 and the texture generating circuit 25. 
[0063] Fig. 4 is a diagram showing the construction of 
the multiplexing circuit 6 of Fig. 1, and the multiplexing 
circuit 6 comprises multiplexer 31, counter 32, additive 
information holding circuit 33, and buffer 34. The multi- 
plexer 31 multiplexes the compressed data of the audio 
signal and the time stamp representing the decoding 
timing which are supplied from the audio coding circuit 
1 , the compressed data of the video signal and the time 
stamp representing the decoding timing which are sup- 
plied from the video coding circuit 2, the compressed 
data of the scene data and the time stamp representing 
the decoding timing which are supplied from the scene 
coding circuit 4. the time stamp representing the com- 
position timing supplied from the composition circuit 5, a 
clock count value supplied from the counter 32 
described later, and additive information supplied from 
the additive information holding circuit 33 described 
later, and generates and outputs a bit stream. 
[0064] The counter 32 counts the clocks supplied from 
the clock generating circuit 8, and outputs the count 
number. The additive information holding circuit 33 
holds overhead information that is preset to be added 
for generation of a bit stream, and outputs-the overhead 
information. The buffer 34 buffers the bit stream output 
from the multiplexer 31 and outputs the bit stream. The 
buffer 34 is needed when the present invention is 
applied to a transmission system, however, it is not nec- 
essarily required when the present invention is applied 
to a storage system. 

[0065] Next, the operation of the coding apparatus 
according to the present invention will be described with 
reference to Figs. 1 to 4 and Fig. 45. 
[0066] Each of the audio coding circuit 1, the video 
coding circuit 2 and the scene coding circuit 4 performs 
compression coding on the input signal thereto, and 
also outputs the time stamp representing the decoding 
timing. As shown in Fig. 2, the encoder 1 1 first performs 
compression processing by using the input signal and 
the locally-decoded data output from the memory 13, 
and writes the compressed data into the buffer 15. At 
the same time, the encoder 1 1 outputs the time stamp 
representing the decoding timing, and writes the time 
stamp into the buffer 14. Subsequently, the decoder 12 
decodes the compressed data supplied from the 
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encoder 11, and adds the compressed data thus 
decoded to the locally-decoded data supplied from the 
memory 13 to create new locally-decoded data: This 
locally-decoded data is newly written into the memory 

[0067] The interface circuit 3 to the scene data sup- 
ports various input modes for scene design and scene 
update such as a keyboard input, a mouse input, and it 
converts input data to coherent scene data and outputs 
the data thus obtained to the scene coding circuit 4. 
With respect to specific scene data, use of data replace- 
ment and data differential may be considered as in the 
case of the concepts of the intra-frame coding, inter- 
frame coding of video signals. The switching between 
the data replacement and the data differential is man- 
aged by the scene coding circuit 4 in response to an 
instruction from the interface circuit 3. Since VRML is 
originally text data, there may be considered a mode in 
which compression isnl performed and scene data are 
directly transmitted. 

[0068] The composition circuit 5 performs the scene 
composition by using the audio data obtained from the 
audio coding circuit 1, the video data obtained from the 
video coding circuit 2 and the scene data obtained from 
the scene coding circuit 4. At the same time, it outputs 
the time stamp representing the composition timing. In 
this case, each of the data is directly used the locally- 
decoded data stored in the memory of the coding cir- 
cuit. More specifically, as shown in Fig. 3, the scene 
generating circuit 201 creates a scene graph on the 
basis of the scene data supplied from the scene coding 
circuit 4, and outputs the scene drawing command and 
the intermediate data. At this time, it outputs the time 
stamp representing the composition timing at the same 
time, and writes it into the buffer 202. Subsequently, the 
conversion processing circuit 203 executes the above 
conversion processing on the basis of the drawing com- 
mand from the scene generating circuit 201, and out- 
puts the coordinate information and the color 
information of an object. 

[0069] Further, the texture data supplied from the tex- 
ture generating circuit 204 are mapped into an object in 
a scene. In parallel to the processing, the texture gener- 
ating circuit 204 deforms the video data obtained from 
the video coding circuit 2 on the basis of the drawing 
command supplied from the scene generating circuit 
201 and the coordinate information supplied from the 
conversion processing circuit 203. The conversion 
processing circuit 203 and the texture generating circuit 
204 execute the respective processing while communi- 
cating data therebetween. 

[0070] Subsequently, the raster circuit 205 converts 
the data from the conversion processing circuit 203 to 
raster data on a pixel basis on the basis of the coordi- 
nate information and the color information of the object 
which are supplied from the conversion processing cir- 
cuit 203, and writes the conversion result into the frame 
buffer 207. The audio signal supplied from the audio 
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coding circuit 1 is delayed and output by the delay circuit 
206. The same operation is also carried out in the con- 
struction of Fig. 45. In this case, not only the audio sig- 
nal is delayed, but also a special effect and other effects 

5 can be easily implemented by CPU 22. 

[0071] There is a case where an event dependent on 
time is described in the scene data. This event is classi- 
fied into a continuous event which varies on time axis, 
and a discrete event which is one-shot event on time 

10 axis. With respect to processing of these events, the 
continuous event is processed as an event occurring at 
the time stamp representing the composition timing, 
and the discrete event is processed as an event occur- 
ring at the time when the time stamp representing the 

is composition timing passes the generation time of the 
discrete even. Accordingly, when the same event 
processing is carried out according to the time stamp 
representing the composition timing at the reception 
side, it is ensured that the same composition result can 

20 be implemented at both the transmission side and the 
reception side. 

[0072] The specific processing is carried out by the 
scene generating circuit 201 of Fig. 3 or the CPU 22 of 
Fig. 45. Therefore, the scene generating circuit 202 or 
25 the CPU 22 has a counter or the like at the inside or the 
outside thereof for time management. The counter is set 
to zero at the time when a session is started, and it is 
driven with clocks supplied by the clock generating cir- 
cuit 8 in the case of the scene generating circuit 202, 
30 while it is driven with clocks which exist independently of 
the dock generating circuit 8 in the case of the CPU 22. 
[0073] The multiplexing circuit 6 multiplexes the com- 
pressed data, the time stamp and the reference clock 
value to generate a bit stream. More specifically, as 
35 shown in Fig. 4, in accordance with a predetermined 
timing, the multiplexer 31 multiplexes the compressed 
data and the time stamp supplied from the audio coding 
circuit 1 , the compressed data and the time stamp sup- 
plied from the video coding circuit 2, the compressed 
40 data and the time stamp supplied from the scene coding 
circuit 4, the time stamp supplied from the composition 
circuit 5, the count value of the clocks supplied from the 
counter 32 and an overhead representing system infor- 
mation supplied from the additive information holding 
45 circuit 33. 

[0074] The counter 32 counts the clocks supplied from 
the clock generating circuit 8, and outputs the count 
value thereof. The additive information holding circuit 33 
holds not only the overhead representing the system 
so information, but also multiplexing management informa- 
tion such as the bit length of each data to be multi- 
plexed, the time stamp and supplies the information as 
control information to the multiplexer 31. As a specific 
mode of the additive information holding circuit may be 
55 used ROM containing predetermined fixed data, a ROM 
card or RAM into which data are loaded at an initializa- 
tion time through a keyboard or the like. 
[0075] Fig. 26 shows a finally-obtained bit stream. 
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That is, the bit stream comprises the reference clock 
value, and the time stamps and compressed data for 
audio, video, scene data respectively. Each time stamp 
representing the decoding timing is appended to the 
corresponding compressed data, and the time stamp 5 
representing the composition timing is selectively 
appended to the compressed video data, to the com- 
pressed scene data, or out of the compressed data as in 
the case of the reference clock. 

[0076] The display circuit 7 performs display and 10 
reproduction of the composite picture signal and the 
audio signal supplied from the composition circuit 5, 
whereby a transmitter can observe, on the spot a pic- 
ture desired to be composed by itself and the audio sig- 
nal thereof. Further, the scene can be suitably updated is 
through the interface circuit 3. The clock generating cir- 
cuit 8 continues to generate clocks (CLK) in a coherent 
way, and supplies the clocks thus generated to the 
audio coding circuit 1, the video coding circuit 2, the 
scene coding circuit 4, the composition circuit 5 and the 20 
multiplexing circuit 6. 

[0077] In the coding apparatus of the first embodiment 
according to the present invention, no consideration is 
given to the delay needed to the composition process- 
ing. That is, when all the processing is carried out while 25 
the frame rates thereof are kept within given limits, the 
time chart representing the processing flow for coding, 
decoding and composition at the coding apparatus side 
is shown in Fig. 28. Here, the coding corresponds to the 
processing of the encoder in the coding circuit, and the 30 
decoding corresponds to the processing of the decoder 
in the coding circuit, that is. the creation of the locally- 
decoded data. The composition corresponds to the 
processing of the composition circuit. The time period 
from the start time of a coding operation to the start time 35 
of the next coding operation corresponds to the frame 
rate of the input video signal. Further, the time period 
from the start time of a composition operation to the 
start time of the next composition operation corre- 
sponds to the frame rate of the composite picture. In 40 
Fig. 28, the coding, the decoding and the composition 
are expressed as sequential processing. However, by 
dividing each of the coding and decoding operations 
into plural threads, the parallel processing on plural sig- 
nals may be supported. An example of the occurrence 45 
timing of the decoding time stamp and the composition 
time stamp is shown in Fig. 28. However, for the pur- 
pose of keeping a fixed delay between the coding appa- 
ratus side and the decoding apparatus side, the 
occurrence timing may be set to the time when the so 
decoding, composition are terminated, or to any time. In 
this case, the coding and the decoding are assumed to 
be absolutely finished within one frame period. 
[0078] On the other hand, when the composition time 
is needed to be long, there is a case where it is required 55 
to continue the composition operation until the time of 
the next composition frame as shown in Fig. 29. When 
the parallel processing of the coding/decoding and the 



composition is not supported, or when the cod- 
ing/decoding and the composition cannot be executed 
in parallel due to a competition problem of an access to 
the memory for storing the locally-decoded data, it is dif- 
ficult to continue the composition or the coding/decod- 
ing no longer. 

[0079] As a countermeasure to the above case, by 
performing the coding, decoding and composing 
processing through the time chart of Fig. 30, the cod- 
ing/decoding can be continued. That is, when the com- 
position processing is not terminated until the time set 
at the coding apparatus side, the coding/decoding 
processing of the video frame at that time is paused, 
and the extra time corresponding to the pause time is 
allocated to the composition processing. For the video 
data of the paused frame, nothing (containing the time 
stamp) is transmitted, or the coding is performed on the 
assumption that there is no variation between the frame 
concerned and the preceding frame. After the composi- 
tion of the frame concerned is terminated, a next com- 
position operation is started in accordance with the 
frame rate of the composite picture. When the composi- 
tion concerned is continued until this time point, the 
composition circuit itself pauses for the next composi- 
tion. However, the coding operation is not paused 
because if the coding of the audio signal is paused, 
sound quality would be remarkably reduced due to 
occurrence of missed sections. 

[0080] Fig. 31 is a time chart for the coding, the 
decoding and the composition when the coding/decod- 
ing for plural input signals is carried out. The cod- 
ing/decoding operation is sequentially carried out on 
two input signals, and then the composition processing 
is carried out. The decoding time stamp and the compo- 
sition time stamp are generated as shown in Fig. 31, 
respectively. 

[0081] Fig. 32 is a time chart when the composition 
processing is continued until a firs input signal of a next 
frame. In this case, as shown in Fig. 33, the cod- 
ing/decoding processing of the first input signal is 
paused, and for the video data of the paused frame, 
nothing (containing the time stamp) is transmitted, or 
the coding is performed on the assumption that there is 
no variation between the frame concerned and the pre- 
ceding frame. For a second input signal, the cod- 
ing/decoding is carried out, and the composition is 
carried out. 

[0082] Likewise, Fig. 34 is a time chart when the com- 
position processing is continued until the second input 
signal of the next frame. In this case, as shown in Fig. 
35, the coding/decoding of the first input signal and the 
coding/decoding of the second input signal are paused. 
For the video data of the paused frame, nothing (con- 
taining the time stamp) is transmitted, or the coding is 
carried out on the assumption that there is no variation 
between the frame concerned and the preceding frame. 
[0083] When the composition processing concerned 
is not finished until the time when the next composition 
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processing is carried out. the composition circuit itself 
pauses for the next composition processing. In the 
decoding apparatus, the decoding and composition 
operations are carried out in accordance with the time 
stamp in the bit stream, and thus when no decoding 
time stamp exists, the decoding processing is automati- 
cally skipped. Therefore, the frame rate of the video sig- 
nal is temporarily reduced, however, the composition 
processing is stably performed. 
[0084] Fig. 5 is a block diagram showing a second 
embodiment of the coding apparatus which is designed 
so that the coding/decoding can be continued even in 
the case where the continuity of the composition is 
requested until the time of the next composite frame. 
[0085] In this embodiment, the coding apparatus of 
the first embodiment is newly added to scheduling cir- 
cuit 153. That is, the video coding circuit 151 is added to 
a control line extending from the scheduling circuit 153 
in addition to the construction of the video coding circuit 
2 of Fig. 1. In addition to the construction of the compo- 
sition circuit 5 of Fig. 1, a composition circuit 152 is 
designed so as to output a signal representing the com- 
position status/that is, whether the composition is termi- 
nated or not, to the scheduling circuit 153. Upon 
receiving the composition status signal from the compo- 
sition circuit 152. the scheduling circuit 153 controls the 
operation of the coding circuit 151 . 
[0086] Fig. 6 shows the construction of the coding cir- 
cuit 151 , and the encoder 1 1 and the decoder 12 of Fig. 
2 are replaced by an encoder 154 and a decoder 155, 
respectively. The coding operation of the encoder 154 
and the decoding operation of the decoder 155 are 
together controlled on the basis of the input from the 
scheduling circuit 153. 

[0087] Fig. 7 shows a first embodiment of the compo- 
sition circuit 152 of Fig. 5, and it is designed in such a 
way that the scene generating circuit 201, the conver- 
sion processing circuit 203, the texture generating cir- 
cuit 204 and the raster circuit 205 of Fig.3 are replaced 
by a scene generating circuit 211, a conversion 
processing circuit 212, a texture generating circuit 213 
and a raster circuit 214, and an OR circuit 215 is newly 
added. Each of the scene generating circuit 211, the 
conversion processing circuit 212, the texture generat- 
ing circuit 213 and the raster circuit 214 has an output 
representing whether the processing thereof is termi- 
nated or not, in addition to the construction of each of 
the scene generating circuit 201, the conversion 
processing circuit 203, the texture generating circuit 204 
and the raster circuit 205 of Fig.3. 
[0088] The OR circuit 215 receives the status inputs 
from the scene generating circuit 211, the conversion 
processing circuit 21 2, the texture generating circuit 213 
and the raster circuit 214 to perform OR operation 
between the status inputs thus received, and outputs 
the OR-operation result. In this case, it is assumed that 
"1 " is set under processing and "0 M is set at the termina- 
tion of the processing. 
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[0089] Fig. 46 shows a second embodiment of the 
composition circuit 152, and H is constructed so that the 
interface circuit 21 of Fig. 45 is replace by an interface 
circuit 156. In addition to the construction of the inter- 

5 face circuit 21, the interlace circuit 156 has an output 
representing the composition status of the composition 
circuit 1 52 to the scheduling circuit 1 53. 
[0090] Next, the operation of the second embodiment 
of the coding apparatus according to the present inven- 

io tion will be described with reference to Figs. 5 to 7 and 
Fig. 46. The basic operation of the coding operation is 
the same as that of the circuit of Fig. 1 . However, a sig- 
nal representing the composition status is transmitted 
from the composition circuit 1 52 to the scheduling circuit 

75 153. As the signal representing the composition status, 
"1" is output when any one or more of the scene gener- 
ating circuit 21 1 , the conversion processing circuit 212, 
the texture generating circuit 213 and the raster circuit 
214 are under operation, and "0" is output when all of 

20 the circuits are at rest as shown in Fig. 7. 

[0091] In the construction of Fig. 46, the CPU 22 
transmits the same signal to the scheduling circuit 153 
through the interface circuit 156. Upon receiving the sig- 
nal, the scheduling circuit 153 outputs "1" when the 
25 input signal is "1 and outputs "0" when the input signal 
is "0". As shown in Fig. 6. the encoder 154/decoder 155 
receives this signal, and the coding circuit 151 does not 
start the coding/decoding even at a predetermined tim- 
ing when the input signal is "1" while the coding circuit 
30 151 starts the coding/decoding when the input signal is 
"0". 

[0092] In Fig. 28, the coding/decoding is illustrated as 
being sequentially carried out and there occurs a prob- 
lem in existence of decoding data when the input signal 

35 is set to "1" at the coding start time and to TT at the 
decoding start time. However, this problem could be 
avoided by presetting the decoding operation so that the 
decoding operation is not carried out when the input sig- 
nal is "1 n at the coding start time. 

40 [0093] The problem of the composition processing 
time shown in Fig. 29 can be also avoided by scheduling 
the coding operation, the decoding operation and the 
composition operation as shown in Fig. 36. In this case, 
when the composition has not been terminated until the 

45 coding start timing of the next frame which is set by the 
coding apparatus, the coding/decoding is not paused, 
but the composition is paused, and then the composi- 
tion is resumed at the time when the coding/decoding is 
finished. When the composition concerned has not 

so been terminated until the next coding start timing, the 
composition is paused again, and the composition 
processing is on' standby until the coding/decoding is 
finished. 

[0094] In the decoding apparatus, the decoding and 
55 the composition are carried out in response to the time 
stamp in the bit stream, and thus it is settled that in 
response to the decoding time stamp, the decoding is 
started while the composition is paused, and the com- 
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position is resumed at the time when the decoding is fin- 
ished. Accordingly, the frame rate of the composite 
picture is temporarily reduced, however, the coding of 
the video signal based on a fixed frame rate is expected. 
This is effective when only the compressed data of the 
video signal is afterwards reused for edition or the like. 
[0095] Fig. 37 is a diagram showing a countermeas- 
ure based on the scheduling of the coding, the decoding 
and the composition for plural input signals of Figs. 32 
and 34. Basically, the same countermeasure as shown 
in Fig. 36 is taken. 

[0096] Fig. 8 is a block diagram showing a third 
embodiment of the coding apparatus according to the 
present invention in which when the coding/decoding 
operation is enabled to continue by pausing the compo- 
sition operation in the case where the continuity of the 
composition until the time of a next composite frame is 
requested. 

[0097] In this embodiment, a scheduling circuit 165 is 
newly added to the coding apparatus of the first embod- 
iment. An audio coding circuit 161, a video coding circuit 
162 and a scene coding circuit 163 has the same con- 
struction as the audio coding circuit 1 . the video coding 
circuit 2 and the scene coding circuit 4 of Fig. 1 respec- 
tively, and also each of the circuit is further designed to 
output to the scheduling circuit 165 a signal represent- 
ing a coding status, that is, whether the coding is carried 
out or not. 

[0098] In addition to the construction of the composi- 
tion circuit 5 of Fig. 1, the composition circuit 164 is 
added with a control line extending from the scheduling 
circuit 165. The scheduling circuit 165 receives the sta- 
tus inputs from the coding circuit 161, the coding circuit 
1 62 and the coding circuit 1 63 to control the operation of 
the composition circuit 164. 

[0099] Fig. 9 shows the construction of the coding cir- 
cuits 161, 162 and 163. and the encoder 11 and the 
decoder 12 of Fig. 2 are replaced by encoder 166 and 
decoder 167. Further, OR circuit 168 is newly provided. 
In addition to the construction of the encoder 1 1 , the 
decoder 12. each of the encoder 166 and the decoder 

167 is further designed so as.to output to the OR circuit 

168 a signal representing whether the processing 
thereof is finished or not. The OR circuit 168 receives 
the status inputs from the encoder 166 and the decoder 
167, and outputs the OR output to the scheduling circuit 
165. In this case, it is assumed that "1" is set under 
processing, and "0" is set at the time when the process- 
ing is finished. 

[0100] Fig. 10 shows a first embodiment of the com- 
position circuit 164 of Fig. 8. The scene generating cir- 
cuit 201, the conversion processing circuit 203, the 
texture generating circuit 204 and the raster circuit 205 
of Fig. 3 are replaced by scene generating circuit 221, 
conversion processing circuit 222, texture generating 
circuit 223 and raster circuit 224, and further control cir- 
cuit 225 is newly added. In addition to the construction 
of each of the scene generating circuit 201, the conver- 



sion processing circuit 203, the texture generating cir- 
cuit 204 and the raster circuit 205 of Fig.3. each of the 
scene generating circuit 221 , the conversion processing 
circuit 222, the texture generating circuit 223 and the 

5 raster circuit 224 is further provided with an input line 
from the control circuit 225. The control circuit 225 
receives an input from the scheduling circuit 165 and 
outputs it to each of the scene generating circuit 221, 
the conversion processing circuit 222, the texture gen- 

io erating circuit 223 and the raster circuit 224 to control 
the operation of each circuit. 

[0101] Fig. 47 shows a second embodiment of the 
composition circuit 164, and in this embodiment the 
interface circuit 21 of Fig. 45 is replaced by an interface 

75 circuit 169. In addition to the construction of the inter- 
face circuit 21, the interface circuit 169 is designed so 
as to receive an input from the scheduling circuit 1 65. 
[01 02] The operation of the third embodiment of the 
coding apparatus of the present invention will be 

20 described with reference to Figs. 8 to 1 0 and Fig. 47. 
The basic operation of the coding operation is the same 
as the circuit of Fig. 1 . However, each of the audio cod- 
ing circuit 161, the video coding circuit 162 and the 
scene coding circuit 163 transmits the coding status to 

25 the scheduling circuit 165. In the coding circuit 161 , the 
coding circuit 162 and the coding circuit 163. an 
encoder 166 and a decoder 167 output a coding state 
and a decoding state to the OR circuit 168 respectively 
as shown in Fig. 9. The output signal is set to T when 

30 the encoder (decoder) is under operation, and "0" when 
it is at a rest. Therefore, the output of the OR circuit 168 
is set to "1" when either of the encoder and the decoder 
is under operation, and "0" when both the encoder and 
the decoder are at a rest. 

35 [01 03] The scheduling circuit 1 65 receives inputs from 
the coding circuits 161 to 163 to perform OR operation 
therebetween, and outputs the OR result. In the compo- 
sition circuit 164, the control circuit 225 receives an 
input from the scheduling circuit 165 and outputs it to 

40 the scene generating circuit 221, the conversion 
processing circuit 222, the texture generating circuit 223 
and the raster circuit 224 as shown in Fig. 10. At the 
time when the input value from the control circuit 225 
varies from M 0" to "1", each of the scene generating cir- 

45 cuit 221 , the conversion processing circuit 222, the tex- 
ture generating circuit 223 and the raster circuit 224 
stores intermediate data and pauses the processing 
thereof. At the time when the input value varies from "1 " 
to "0", each circuit recovers the intermediate data and 

so resumes the processing. When the input value is equal 
to "1" at all times, each circuit is at a rest. When the 
input value is equal to "0" at all times, the processing is 
started in synchronism with the composition timing. 
[0104] In the first to third embodiments of the coding 

55 apparatus according to the present invention, the same 
clocks are supplied from the same clock generating cir- 
cuit for the audio signal, the video signal and the scene 
data. However, according to the system shown in 
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ISO/IEC JTC1/SC29/WG11 N1825 described in the 
conventional technique, it is allowed that different clocks 
may be provided for each of the audio signal, the video 
signal and the scene data. Accordingly, in the coding 
apparatus of the present invention, there may be pro- 
vided different clocks between the audio signal, the 
video signal and the scene data. 
[0105] Fig. 1 1 shows a fourth embodiment of the cod- 
ing apparatus according to the present invention. In the 
fourth embodiment, a clock generating circuit is individ- 
ually provided to each of the audio coding circuit 1 , the 
video coding circuit 2, the scene coding circuit 4 and the 
composition circuit 5 in the first embodiment. That is, in 
place of the clock generating circuit 8 of Fig. 1, three 
clock generating circuits 1 71 , 1 72 and 1 73 are provided. 
The audio coding circuit 1 is supplied with clocks 
(CLK1) from the clock generating circuit 171, the video 
coding circuit 2 is supplied with clocks (CLK2) from the 
clock generating circuit 1 72 and the scene coding circuit 
4 and the composition circuit 5 are supplied with clocks 
(CLK3) from the clock generating circuit 173. 
[0106] In addition to the construction of the multiplex- 
ing circuit 6 of Fig. 1, the multiplexing circuit 174 is 
designed to receive clock inputs from three clock gener- 
ating circuits 1 71 , 1 72, 1 73. , 
[0107] Fig. 12 shows the construction of the multiplex- 
ing circuit 174 of Fig. 11. The multiplexing circuit 174 
has three counters 32 in association with the three clock 
generating circuits 171, 172 and 173 in addition to the 
construction of the multiplexing circuit 6 of Fig. 4. A mul- 
tiplexer 175 is designed so as to receive and multiplex 
inputs from the three counters 32 in addition to the con- 
struction of the multiplexer 31 of Fig. 4. 
[01 08] Next, the operation of the fourth embodiment of 
the coding apparatus according to the present invention 
will be described with reference to Figs. 11 to 13. The 
basic operation of the coding is the same as the circuit 
of Fig. 1 . The difference from the circuit of Fig. 1 resides 
in that the audio coding circuit 1 , the video coding circuit 
2 and both the scene coding circuit 4 and the composi- 
tion circuit 5 are respectively operated with the respec- 
tive clocks supplied from the three different clock 
generating circuits 171/172 and 173, and that the mul- 
tiplexing circuit 1 74 multiplexes the clocks supplied from 
the three different clock generating circuits 1 71 , 1 72 and 
173. 

[0109] The final bit stream is shown in (1) of Fig. 27. 
That is, the bit stream comprises a reference clock 
value, a time stamp and compressed data for each of 
audio, video and scene data. Each time stamp repre- 
senting the decoding timing is appended to the corre- 
sponding compressed data, and the time stamp 
representing the composition timing is appended to the 
compressed scene data which is an output' of the scene 
coding circuit 4 operating with the same clock as the 
composition circuit 5. 

[0110] Fig. 13 shows a fifth embodiment of the coding 
apparatus of the present invention. According to the 



coding apparatus of this embodiment, three different 
clock generating circuits 171, 172 and 173 are respec- 
tively allocated to the audio coding circuit 1 , the video 
coding circuit 151, and both the scene coding circuit 4 
5 and the composition circuit 152 in the coding apparatus 
of the second embodiment. The multiplexing circuit 174 
has the same construction as the fourth embodiment. 
[0111] Fig. 14 shows a sixth embodiment of the cod- 
ing apparatus according to the present invention. 
10 According to the coding apparatus of this embodiment, 
three different dock generating circuits 171, 172 and 
1 73 are respectively allocated to the audio coding circuit 
161, the video coding circuit 162, and both the scene 
coding circuit 163 and the composition circuit 164 in the 
is coding apparatus of the third embodiment. The multi- 
plexing circuit 1 74 has the same construction as the 
fourth embodiment. 

[0112] Fig. 15 shows a seventh embodiment of the 
coding apparatus of the present invention. According to 
20 the seventh embodiment, three different clock generat- 
ing circuits 171 , 172 and 173 are respectively allocated 
to the audio coding circuit 1 , both the video coding cir- 
cuit 2 and the composition circuit 5, and the scene cod- 
ing circuit 4 in the coding apparatus of the first 
25 embodiment. The multiplexing circuit 174 has the same 
construction as the fourth embodiment. The basic oper- 
ation of the coding is as the same as the circuit of Fig. 1 . 
The difference from the circuit of Fig. 1 resides in that 
the audio coding circuit, both the video coding circuit 2 
30 and the composition circuit 5, and the scene coding cir- 
cuit 4 are operated with the respective clocks supplied 
from the different three clock generating circuits 171, 
172 and 173, and that the multiplexing circuit multi- 
plexes the clocks supplied from the three different clock 
35 generating circuits 1 71 , 1 72 and 1 73. 

[0113] The final bit stream is shown in (2) of Fig. 27. 
That is, the bit stream comprises a reference clock 
value, a time stamp and compressed data for each of 
audio, video and scene data. Each time stamp repre- 
40 senting the decoding timing is appended to the corre- 
sponding compressed data, and the time stamp 
representing the composition timing is appended to the 
compressed video data which is an output of the video 
coding circuit 2 operating with the same clocks as the 
45 composition circuit 5. 

[0114] Fig. 16 shows an eighth embodiment of the 
coding apparatus according to the present invention. 
According to the eight embodiment, three different clock 
generating circuits 171,172, and 173 are respectively 
so allocated to the audio signal circuit 1 , both the video sig- 
nal circuit 15 and the composition circuit 5, and the 
scene coding circuit £ in the coding apparatus of the 
second embodiment. The multiplexing circuit 174 has 
the same construction as the fourth embodiment. 
55 [01 1 5] Fig. 1 7 shows a ninth embodiment of the cod- 
ing apparatus according to the present invention. 
According to the ninth embodiment, three different clock 
generating circuits 171, 172 and 173 are respectively 
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allocated to the audio coding circuit 161 , and both the 
video coding circuit 162 and the composition circuit 164, 
and the scene coding circuit 163 in the coding appara- 
tus of the third embodiment of the present invention. 
The multiplexing circuit 174 has the same construction s 
as the fourth embodiment. 

[0116] Fig. 18 is a block diagram showing a first 
embodiment to the decoding apparatus of the present 
invention. The decoding apparatus of the present inven- 
tion comprises a separation circuit (demultiplexing cir- 10 
cuit) 41, a decoding circuit 42 for audio signals 
(hereinafter referred to as "audio decoding circuit"), a 
decoding circuit 43 for video signals (hereinafter 
referred to as "video decoding circuit"), a decoding cir- 
cuit 44 for scene data (hereinafter referred to as "scene is 
decoding circuit"), a composition circuit 45, a display cir- 
cuit 46, a dock generating circuit 47 and an interaction 
circuit 48. 

[01 1 7] The separation circuit 41 outputs from an input 
bit stream the compressed data and the time stamp rep- 20 
resenting the decoding timing for the audio signal, the 
compressed data and the time stamp representing the 
decoding timing for the video signal, the compressed 
data and the time stamp for the scene data, the time 
stamp representing the composition timing and a refer- 25 
ence clock value supplied to the clock generating circuit 

47 (described later). 

[0118] The audio decoding circuit 42 decodes the 
compressed data input from the separation circuit 41 at 
the time represented by the time stamp representing the 30 
decoding timing which is input from the separation cir- 
cuit 41 . The video decoding circuit 43 decodes the com- 
pressed data input from the separation circuit 41 at the 
time represented by the time stamp representing the 
decoding timing which is input from the separation cir- 35 
cuit 41. The scene decoding circuit 44 decodes the 
compressed data input from the separation circuit 41 at 
the time represented by the time stamp representing the 
decoding timing which is input from the separation cir- 
cuit 41. 40 
[01 1 9] The composition circuit 45 performs the com- 
position processing on the audio signal from the audio 
decoding circuit 42, the video signal from the video 
decoding circuit 43 and the scene data from the scene 
decoding circuit 44 input thereto in accordance with a 45 
scene description described in the scene data at the 
time represented by the time stamp representing the 
composition timing input from the separation circuit 41 , 
and outputs a composite picture and the audio signal. 
Further, it accepts input data from the interaction circuit so 

48 described later to implement user interaction such as 
movement of a viewing point. 

[0120] The display circuit 46 receives the composite 
picture signal and the audio signal from the composition 
circuit 45, and displays/reproduces these signals 55 
through a display or the like for pictures and through a 
speaker or the like for sounds. The clock generating cir- 
cuit 47 generates clocks (CLK10) in accordance with 



the reference clock value supplied from the separation 
circuit 41 , and supplies the clocks to the audio decoding 
circuit 42, the video decoding circuit 43, the scene 
decoding circuit 44 and the composition circuit 45. The 
clock generating circuit 47 is generally constructed as 
PLL (Phased Locked Loop), and the reference clock 
value is used to control the oscillation frequency of the 
clocks. 

[01 21 ] The interaction circuit 48 accepts an interaction 
such as a keyboard input, a mouse input or the like from 
a viewer to convert it to data representing movement of 
a viewing point or the like, and outputs the conversion 
result to the composition circuit 45. 
[0122] Fig. 19 shows the construction of the separa- 
tion circuit 41 of Fig. 18, and it comprises buffer 51, 
demultiplexer 52 and additive information holding circuit 
53. The buffer 51 buffers a bit stream which is transmit- 
ted through a network or read out from a storage 
medium such as a disk or the like. The demultiplexer 52 
separates the bit stream input from the buffer 51 into the 
compressed data and the time stamp representing the 
decoding timing for the audio information, the com- 
pressed data and the time stamp representing the 
decoding timing for the video information; the com- 
pressed data and the time stamp representing the 
decoding timing for the scene data, the time stamp rep- 
resenting the composition timing, the reference clock 
value and overhead serving as system information on 
the basis of the management information such as bit 
length which are hold in the additive information holding 
circuit 53. 

[0123] The additive information holding circuit 53 
holds not only the overhead representing the system 
information, but also the multiplexing management 
information such as the bit length of each data to be 
multiplexed, the time stamps and supplies these data as 
control information to the demultiplexer 52. As specific 
modes of the additive information holding circuit 53 may 
be considered a ROM containing predetermined fixed 
data, a ROM card, a RAM into which data are loaded 
through a keyboard or the like at an initialization time, a 
RAM for storing bit stream information contained in the 
overhead serving as the system information in the bit 
stream or the like. 

[01 24] Fig. 20 shows the construction of the decoding 
circuits 42, 43 and 44 of Fig. 18, and it comprises a 
buffer 61, a buffer 62, a decoder 63 and a memory 64. 
The buffer 61 buffers a time stamp representing a 
decoding timing which is supplied from the separation 
circuit 41. The buffer 62 buffers a compressed data 
which is supplied from the separation circuit 41 . The 
decoder 63 receives the compressed data supplied 
from the buffer 62 and the decoding data supplied from 
a memory 64 described later at the time of the time 
stamp representing the decoding timing supplied from 
the buffer 61 to perform the decoding operation. The 
decoder 63 is supplied with clocks from the clock gener- 
ating circuit 47. 
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[0125] The memory 64 stores the decoding data sup- 
plied from the decoder 63. In this construction, the 
decoding operation of the decoder 63 is carried out on 
the assumption that the decoding data stored in the 
memory 64 are used. However, there is a case where 5 
the decoding data are not used as in the case of an 
intra-frame coding of video. In the case of scene data, 
text data that are not compressed may be considered, 
in this case, the data are merely written into the memory 
modification. 10 

[01 26] Fig. 21 shows 64 with no a first embodiment of 
the composition circuit 45 of Fig. 18. According to this 
embodiment, in the construction of Fig. 3, the scene 
generating circuit 201 is replaced by a scene generating 
circuit 231 and the buffer 202 is replaced by a buffer 15 
232, and a buffer 233 is further added. The scene gen- 
erating circuit 231 is designed so that the output line of 
the time stamp representing the composition timing is 
removed from the scene generating circuit 201 and in 
place of the output line thus removed, input lines from 20 
the buffer 232 and the buffer 233 are added. The buffer 
232 buffers the time stamp representing the composi- 
tion timing from the separation circuit 41 . The buffer 233 
buffers interaction data from the interaction circuit 48. 
The clocks from the clock generating circuit 47 are sup- 25 
plied to the scene generating circuit 231 , the conversion 
processing circuit 203, the texture generating circuit 204 
and the raster circuit 205. 

[0127] Fig. 48 shows a second embodiment of the 
composition circuit 45. In the construction of Fig. 48, the 30 
interface circuit 21 of Fig.45 is replaced by an interface 
circuit 49. The interface circuit 49 is designed so that the 
output line to the multiplexing circuit 6 is removed from 
the interface circuit 21 of Fig. 45, and in place of the out- 
put line thus removed an input line for the time stamp 35 
representing the composition timing from the separation 
circuit 41 and an input line for interaction data from the 
interaction circuit 48 are newly added. 
[0128] Next, the operation of the decoding apparatus 
according to the present invention will be described with 40 
reference to Figs. 18 to 21 and Fig. 48. The separation 
circuit 41 separates the bit stream input thereto into the 
compressed data and the time stamp representing the 
decoding timing for the audio signal, the compressed 
data and the time stamp representing the decoding tim- 45 
ing for the video signal, the compressed data and the 
time stamp representing the decoding timing for the 
scene data, the time stamp representing the composi- 
tion timing and the reference clock value supplied to the 
clock generating circuit 47 described later. 50 
[0129] As shown in Fig. 19, in the separation circuit 
41 , the buffer 51 first buffers the bit stream input. Subse- 
quently, the demultiplexer 52 separates the bit stream 
supplied from the buffer 51 into the compressed data 
and the time stamp representing the decoding timing for 55 
the audio signal, the compressed 
data and the time stamp representing the decoding tim- 
ing for the video signal, the compressed data and the 



time stamp representing the decoding timing for the 
scene data, the time stamp representing the composi- 
tion timing, the reference clock value supplied to the 
dock generating circuit 47 described later and the over- 
head information of a system header portion on the 
basis of an initialization set value or control information 
supplied from the additive information holding circuit 53 
for holding the bit stream information contained in the 
system header portion of the bit stream. The additive 
information holding circuit 53 stores the overhead infor- 
mation of the system header portion supplied from the 
demultiplexer 52 as occasion demands. 
[0130] Next, the clock generating circuit 47 receives 
the reference clock value supplied from the separation 
circuit 41, and controls the oscillation frequency in 
accordance with the reference clock vale to generate 
and output clocks. However, in the case of an applica- 
tion for which the decoding apparatus periodically and 
positively fetches bit streams, for example, in such a 
case that the bit streams are contained in a storage 
medium appended to the decoding apparatus, the clock 
generating circuit 47 may neglect the reference clock 
value supplied from the separation circuit 41 and gener- 
ate clocks at the oscillation frequency itself as in the 
case of the clock generating circuit 8. 
[01 31 ] Next, each of the audio decoding circuit 42, the 
video decoding circuit 43 and the scene decoding circuit 
44 executes the corresponding decoding operation on 
the compressed data at the time given by the corre- 
sponding time stamp representing the decoding timing. 
As shown in Fig. 20, the decoder 63 first performs the 
decoding operation by using the compressed data give 
from the buffer 62 and the decoding data give from the 
memory 64 , and newly writes the decoding data thus 
created into the memory 64. At this time, the clocks 
(CLK 10) are supplied from the clock generating circuit 
47 to each of the audio decoding circuit 42, the video 
decoding circuit 43 and the scene decoding circuit 44. 
[0132] Next, the composition circuit 45 performs the 
composition processing at the time of the time stamp 
representing the composition timing supplied from the 
separation circuit 41 by using the audio data obtained 
from the audio decoding circuit 42, the video data 
obtained from the video decoding circuit 43 and the 
scene data obtained from the scene decoding circuit 43. 
In this case, the respective data may be directly used 
the decoding data stored in the memory of the decoding 
circuit. Further, an interaction such as movement of the 
viewing point for composite pictures, audio is reflected 
in accordance with the interaction data given from the 
interaction circuit 48. 

[0133] The operation of Fig. 21 showing the first 
embodiment of the composition circuit 45 is basically 
the same as the circuit of Fig. 3. However, the scene 
generating circuit 231 starts the composition processing 
at the time of the time stamp representing the composi- 
tion timing given from the buffer 232, and it creates 
scene graph by using the scene data given from the 



16 



BNSDOCID: <EP 0924934A1_I_> 



31 



EP 0 924 934 A1 



32 



decoding circuit 44 and the interaction data given from 
the buffer 233 as in the case of the scene generating cir- 
cuit 201, and then outputs a scene drawing command 
and intermediate data. The start of the operation of the 
other circuits can be supported by providing another 5 
control lines or setting the drawing command transmis- 
sion time to the processing start time. 
[0134] The operation of Fig. 48 showing the second 
embodiment of the composition circuit 45 is basically 
the same as the circuit of Fig. 45. However, CPU 22 10 
starts the composition processing at the time of the time 
stamp representing the composition timing given from 
the separation circuit 41 through the interlace circuit 49. 
[0135] The operation of the display circuit 46 is the 
same as the display circuit 7 shown in Fig. 1 . An interac- is 
tion is applied to the resultingly displayed composite pic- 
ture signal and audio signal through a keyboard, a 
mouse or the like by a viewer and the result is input to 
the interaction circuit 48. 

[01 36] Fig. 38 is a time chart showing the relationship 20 
among data of the buffer in the decoding circuit of the 
decoding apparatus of Fig. 18, the decoding processing 
on the data, data of the memory in the decoding circuit, 
the composition processing on the data and the final 
composition picture. As input compressed data are 25 
assumed first compressed video data, second com- 
pressed video data and scene data. The decoding oper- 
ation on the respective data is started at the time of the 
time stamp representing the decoding timing. The data 
are read out from the buffer and the decoding process- 30 
ing is executed, and the decoding data thus obtained 
are written into the memory. Subsequently, the compo- 
sition processing is started at the time of the time stamp 
representing the composition timing, and the respective 
decoding data are simultaneously read out from the 35 
memory and the composition processing is executed. 
The composite picture thus obtained is displayed. Fig. 
39 is a time chart showing the flow of the decoding 
processing and the composition processing. 
[0137] Fig. 39 shows a case where the processing 40 
speed of the decoding apparatus is sufficiently high and 
the composition is terminated within an estimated time 
of the coding apparatus. However when the processing 
speed of the decoding apparatus is not sufficient, there 
is a case where the composition processing needs a 45 
longer time than the estimated time of the coding appa- 
ratus. Rg. 40 is a time chart when the composition 
processing in the decoding apparatus needs a time 
above the estimated time. 

[0138] As a countermeasure to the above case, the so 
decoding and composition processing as shown in the 
time chart of Fig. 41 can be performed. That is, when 
the composition processing has not yet been terminated 
until the time set at the coding apparatus side, the com- 
position is paused at the time point, that is. the time 55 
stamp representing the composition timing is neglected, 
and the composition is resumed at the termination time 
of the decoding operation. When the composition con- 



cerned has not yet been terminated until the next 
decoding start timing again, the composition is paused 
again and it is on standby until the decoding is termi- 
nated. 

[01 39] With respect to the audio signal and the video 
signal, preceding (just-before) decoding data are used 
for a next decoding operation, and thus skip of the 
decoding processing causes reduction in quality. There- 
fore, by pausing the composition processing as 
described above, the composition that causes no reduc- 
tion in quality of the audio signal and the video signal 
can be implemented although the frame rate of the com- 
position is reduced. However, when the pause of the 
composition causes missing of the audio signal in the 
reproduction operation, it causes great reduction in 
quality. Therefore, the reproduction of the audio signal 
in the composition is settled not to be paused. 
[0140] Fig. 42 is a timing chart for the normal decod- 
ing and composition when plural input data exist, Fig. 43 
is a time chart for the decoding and composition show- 
ing occurrence of the same problem as Fig. 40 when 
plural input data exist, and Fig. 44 is a time chart for the 
decoding and composition, which shows a solving 
method of the same problem as Fig. 41 when plural 
input data exist. 

[0141] Fig. 22 is a block diagram showing a second 
embodiment of the decoding apparatus of the present 
invention. In this embodiment, the separation circuit 41 
of Fig. 18 is replaced by a separation circuit 181, and 
different clock generating circuits 182, 183 and 184 are 
individually allocated to the decoding circuit 42 of the 
compressed audio data, the decoding circuit 43 for the 
compressed video data, and both the decoding circuit 
44 for the compressed scene data and the composition 
45, respectively. The separation circuit 181 is basically 
the same as the separation circuit 41, however, it is 
designed to output three reference clock values. The 
operation of the clock generating circuit 1 82, 1 83, 1 84 is 
the same as the clock generating circuit 47, and the 
oscillation frequencies thereof are controlled with the 
respective reference clock values given from the sepa- 
ration circuit 181. 

[0142] As shown in Fig. 23, the separation circuit 181 
is designed so that the demultiplexer 52 of Fig. 19 is 
replaced by a demultiplexer 185. The demultiplexer 185 
has three output lines for reference clock values. 
[0143] Next, the operation of the circuit of Fig. 22 will 
be described. The basic operation is the same opera- 
tion of the circuit of Fig. 18. The difference resides in 
that the decoding circuit 42 for the compressed audio 
data (hereinafter referred to as "compressed audio 
decoding circuit"), the decoding circuit 43 for the com- 
pressed video data (hereinafter referred to as "com- 
pressed video decoding circuit"), and both the decoding 
circuit 44 for the compressed scene data (hereinafter 
referred to as "compressed scene decoding circuit") 
and the composition circuit 45 are respectively operated 
with clocks (CLK11), (CLK12) and (CLK13) supplied 
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from the three different clock generating circuits 182, 

183 and 184, respectively, and the separation circuit 
1 81 separates and outputs the three different reference 
clock values. 

[0144] Fig. 24 is a block diagram showing a third s 
embodiment of the decoding apparatus of the present 
invention. In this embodiment, the separation circuit 41 
of Fig. 18 is replaced by the separation circuit 181 . Fur- 
ther, the different clock generating circuits 182,183,and 

184 are individually allocated to the compressed audio 10 
decoding circuit 42, both of 1he compressed video 
decoding circuit 43 and the composition circuit 45, and 

the compressed scene decoding circuit 44, respectively. 
The separation circuit 181 and the clock generating cir- 
cuits 182, 183 and 184 are the same as the second is 
embodiment of Fig. 22. 

[0145] Next, the operation of the circuit of Fig. 24 will 
be described. 

[0146] The basic operation is the same as the circuit 
of Fig. 18. The difference resides in that the com- 20 
pressed audio decoding circuit 42, both of the com- 
pressed video decoding circuit 43 and the composition 
circuit 45, and the compressed scene decoding circuit 
44 are operated with the clocks (CLK11, CLK12, 
CLK13) supplied from the three different clock generat- 25 
ing circuits 182,183 and 184, respectively, and the sep- 
aration circuit 181 separates and outputs the three 
different reference clock values. 
[0147] Fig. 25 is a block diagram showing an embod- 
iment of the coding/decoding system in which the cod- 30 
ing apparatus and the decoding apparatus according to 
the present invention are linked to each other through a 
transmission/storage system. In Fig. 25, the cod- 
ing/decoding system comprises coding apparatus 191 , 
decoding apparatus 192 and a transmission/storage 35 
system. 

[0148] The coding apparatus 191 first receives the 
audio signal, the video signal and the scene data to per- 
form the coding operation on these data, and further 
multiplexing the data to form a bit stream, and then 40 
transmits the multiplexed data to the transmission/stor- 
age system. Further, the decoding apparatus 192 
decodes a bit stream transmitted from the transmis- 
sion/storage system, receives an interaction from a 
viewer to perform the composition processing, and then 4s 
outputs the composite picture and the audio signal. 
[0149] As described above, according to the present 
invention, by using the time stamp representing the 
composition timing, a desired composite picture can be 
formed at the coding apparatus side and the synchro- so 
nous reproduction can be performed at the decoding 
apparatus side. Further, when plural video signals or 
scene data exist and the coding/decoding is displaced 
in phase between these signals or data, the time stamp 
representing the composition timing is added to a 55 
stream of them to manage the composition timing in the 
decoding apparatus. Further, in accordance with com- 
plexity of the composition, the decoding operation and 



the composition operation of the decoding apparatus 
can be controlled at the coding apparatus side. 
[0150] It is unnecessary to provide the two time 
stamps of the time stamp representing the decoding 
timing and the time stamp representing the composition 
timing, and by using one flag it may be informed 
whether the stream concerned is a stream for managing 
the composition processing or not. As described above, 
use of the flag can avoid necessity of inserting the time 
stamp representing the composition timing into the bit 
stream, and thus the bit amount can be reduced. In this 
case, it is assumed that the decoding timing and the 
composition timing are coincident with each other. 
[01 51 ] Fig. 49 shows an embodiment of the bit stream 
of the present invention when the 1 -bit flag as described 
above is used, a 1-bit flag is added to the time stamp 
representing the decoding timing which is appended to 
each of the compressed audio data, the compressed 
video data and the compressed scene data, and then 
the multiplexing operation is carried out to generate a bit 
stream. 

[0152] It is assumed that when the flag is "0", it is 
assumed that the time stamp representing the decoding 
timing does not double as the time stamp representing 
the composition timing while when the flag is "I", the 
time stamp representing the decoding timing doubles as 
the time stamp representing the composition timing. 
[0153] Fig. 50 shows another embodiment of the bit 
stream according to the present invention in which the 
1 -bit flag is added to the reference clock value and the 
time stamp representing the decoding timing. The 1-bit 
flag is added to the reference clock value and the time 
stamp representing the decoding timing which is 
appended to each of the compressed audio data, the 
compressed video data and the compressed scene 
data, and the multiplexing operation is carried out to 
generate a bit stream. 

[01 54] It is assumed that when the flag is "0", the time 
stamp representing the decoding timing does not dou- 
ble as the time stamp representing the composition tim- 
ing while when the flag is "1", the time stamp 
representing the decoding timing doubles as the time 
stamp representing the composition timing. 
[0155] Fig. 51 is a block diagram showing a tenth 
embodiment of the coding apparatus according to the 
present invention. 

[01 56] According to this embodiment, in the construc- 
tion of Fig. 5, the video coding circuit 151, the scene 
coding circuit 4, the composition circuit 1 52 and the mul- 
tiplexing circuit 6 are replaced by a coding circuit 241 , a 
coding circuit 242, a composition circuit 243 and a mul- 
tiplexing circuit 244. 

[0157] Next, the operation of the circuit of Fig. 51 will 
be described. 

[0158] The operation of the circuit of Fig. 51 is basi- 
cally the same as that of Fig. 5. However, the video cod- 
ing circuit 241 and the scene coding circuit 242 set the 
flag of the bit stream of the present invention to H 1" and 
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outputs It as time information together with the time 
stamps representing the decoding timing when the 
streams thereof carry the composition timing. Con- 
versely, when the streams do not carry the composition 
timing, the flag of the bit stream of the present invention 5 
is set to "0", and output as time information together 
with the time stamp representing the decoding timing. 
The composition circuit 243 outputs the composition 
status as in the case of the composition circuit 152 of 
Fig. 5. On the other hand, when the composition m> 
processing of the composition circuit 243 is not termi- 
nated, the video coding circuit 241 or the scene coding 
circuit 242 sets the flag of the bit stream of the present 
invention to "0" and outputs it as time information 
together with the time stamp representing the decoding 15 
timing even if the stream originally carries the composi- 
tion timing. The multiplexing circuit 244 generates and 
outputs the bit stream according to the present inven- 
tion. 

[0159] Fig. 52 is a block diagram showing a fourth 20 
embodiment of the decoding apparatus according to the 
present invention. In this embodiment, in the construc- 
tion of Fig. 18, the separation circuit 41 is replaced by a 
separation circuit 251 . The separation circuit 251 copies 
and outputs the time stamp representing the decoding 25 
timing of a stream which carries the composition timing. 
[0160] Next, the operation of the circuit of Fig. 52 will 
be described. 

[0161] The operation of the circuit of Fig. 52 is basi- 
cally the same as Fig. 18. However, according to the 30 
flag of the bit stream of the presents invention, the sep- 
aration circuit 251 copies and outputs the time stamp 
representing the decoding timing of a stream which car- 
ries the composition timing. The composition circuit 45 
starts the composition operation in accordance with the 35 
time stamp. However, actually, it waits until the termina- 
tion of the processing of the decoding circuit which 
decodes the stream carrying the composition timing, 
and starts the composition processing just after the ter- 
mination of the processing. 40 
[0162] Further, the coding apparatus and the decod- 
ing apparatus shown in Figs. 51 and 52 may be linked to 
each other to fabricate the coding/decoding system 
shown in Fig. 25. 

[0163] According to the coding apparatus of the 45 
present invention, the time stamp representing the com- 
position timing is added to the bit stream. Therefore, the 
generation of a desired composition picture at the cod- 
ing side can be ensured, and the stream data that are 
transmitted continuously on time axis can be supported, so 
In addition, the coding /decoding synchronous repro- 
duction of audio signals, video signals and artificial 
scene data can be implemented with supporting the 
interaction function at the decoding side. 
[0164] According to the second embodiment of the 55 
coding apparatus of the present invention, when the 
composition load is high, the coding processing of the 
video signal is controlled and the time stamp represent- 



ing the composition is added to the bit stream. There- 
fore, the generation of a composite picture desired at 
the coding side can be ensured and the stream data 
that are transmitted continuously on time axis can be 
supported. In addition, the coding/decoding synchro- 
nous reproduction of audio signals, video signals and 
artificial scene data can be implemented with support- 
ing the interaction function at the decoding side and 
without reducing the composition frame rate. 
[0165] According to the third embodiment of the cod- 
ing apparatus of the present invention, when the com- 
position load is high, the composition processing is 
controlled, and the time stamp representing the compo- 
sition timing is added to the bit stream. Therefore, the 
generation of a composite picture desired at the coding 
side can be ensured and the stream data that are trans- 
mitted continuously on time axis can be supported. In 
addition, the coding/decoding synchronous reproduc- 
tion of audio signals, video signals and artificial scene 
data can be implemented with supporting the interac- 
tion function at the decoding side and without reducing 
the frame rate of video signal. 

[01 66] According to the fourth embodiment of the cod- 
ing apparatus of the present invention, the same clocks 
are supplied to the composition circuit and the coding 
circuit for artificial scene data, and the time stamp rep- 
resenting the composition timing is added to the com- 
pressed data of the artificial scene data to generate a bit 
stream. Therefore, the generation of a composite pic- 
ture desired at the coding side can be ensured and the 
stream data that are transmitted continuously on time 
axis can be supported. In addition, the coding/decoding 
synchronous reproduction of audio signals, video sig- 
nals and artificial scene data when the coding is per- 
formed with clocks which are different among the audio 
signal, the video signal and the artificial scene data can 
be implemented with supporting the interaction function 
at the decoding side. 

[01 67] According to the fifth embodiment of the coding 
apparatus of the present invention, when the composi- 
tion load is high, the coding processing of the video sig- 
nal is controlled, the same clocks are supplied to the 
composition circuit and the coding circuit for the artificial 
scene data, and the time stamp representing the com- 
position timing is appended to the compressed data of 
the artificial scene data to generate a bit stream. There- 
fore, the generation of a composite picture desired at 
the coding side can be ensured and the stream data 
that are transmitted continuously on time axis can be 
supported. In addition, the coding/decoding synchro- 
nous reproduction of audio signals, video signals and 
artificial scene data when the coding is performed with 
clocks which are different among the audio signal, the 
video signal and the artificial scene data can be imple- 
mented with supporting the interaction function at the 
decoding side and without reducing the composition 
frame rate. 

[01 68] According to the sixth embodiment of the cod- 
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ing apparatus of the present invention, when the com- 
position load is high, the composition processing is 
controlled, the same clocks are supplied to the compo- 
sition circuit and the coding circuit for the artificial scene 
data, and the time stamp representing the composition 5 
timing is appended to the compressed data of the artifi- 
cial scene data to generate a bit stream. Therefore, the 
generation of a composite picture desired at the coding 
side can be ensured and the stream data which are 
transmitted continuously on time axis can be supported. 10 
In addition, the coding/decoding synchronous reproduc- 
tion of audio signals, video signals and artificial scene 
data when the coding is performed with clocks which 
are different among the audio signal, the video signal 
and the artificial scene data can be implemented with 15 
supporting the interaction function at the decoding side 
and without reducing the frame rate of video signal. 
[0169] According to the seventh embodiment of the 
coding apparatus of the present invention, the same 
clocks are supplied to the composition circuit and the 20 
coding circuit for the video signal, and the time stamp 
representing the composition timing is appended to the 
compressed data of the video signal to generate a bit 
stream. Therefore, the generation of a composite pic- 
ture desired at the coding side can be ensured and the 25 
stream data that are transmitted continuously on time 
axis can be supported. In addition, the coding/decoding 
synchronous reproduction of audio signals, video sig- 
nals and artificial scene data when the coding is per- 
formed with clocks which are different among the audio 30 
signal, the video signal and the artificial scene data can 
be implemented with supporting the interaction function 
at the decoding side. 

[01 70] According to the eighth embodiment of the cod- 
ing apparatus of the present invention, when the com- 35 
position load is high, the coding processing of the video 
signal is controlled, the same clocks are supplied to the 
composition circuit and the coding circuit for the video 
signal, and the time stamp representing the composition 
timing is appended to the compressed data of the video 40 
signal to generate a bit stream. Therefore, the genera- 
tion of a composite picture desired at the coding side 
can be ensured and the stream data which are transmit- 
ted continuously on time axis can be supported. In addi- 
tion, the coding/decoding synchronous reproduction of 45 
audio signals, video signals and artificial scene data 
when the coding is performed with clocks which are dif- 
ferent among the audio signal, the video signal and the 
artificial scene data can be implemented with support- 
ing the interaction function at the decoding side and so 
without reducing the composition frame rate. 
[0171] According to the ninth embodiment of the cod- 
ing apparatus of the present invention, when the com- 
position load is high, the composition processing is 
controlled, the same clocks are supplied to the compo- 55 
sition circuit and the coding circuit for the video signal, 
and the time stamp representing the composition timing 
is appended to the compressed data of the video signal 



to generate a bit stream. Therefore, the generation of a 
composite picture desired at the coding side can be 
ensured and the stream data which are transmitted con- 
tinuously on time axis can be supported. In addition, the 
coding/decoding synchronous reproduction of audio 
signals, video signals and artificial scene data when the 
coding is performed with clocks which are different 
among the audio signal, the video signal and the artifi- 
cial scene data can be implemented with supporting the 
interaction function at the decoding side and without 
reducing the frame rate of video signal. 
[0172] According to the decoding apparatus of the 
present invention, the composition processing is per- 
formed by using the time stamp representing the com- 
position timing that is added to the bit stream. 
Therefore, the generation of a composite picture 
desired at the coding side can be ensured and the 
stream data that are transmitted continuously on time 
axis can be supported. In addition, the coding/decoding 
synchronous reproduction of audio signals, video sig- 
nals and artificial scene data can be implemented with 
supporting the interaction function at the decoding side. 
[0173] According to the second embodiment of the 
decoding apparatus of the present invention, the com- 
position circuit and the decoding apparatus for the com- 
pressed artificial scene data are driven by using clocks 
generated with a reference clock value which is 
appended to the compressed data of the artificial scene 
data in the bit stream, and the composition processing 
is performed by using the time stamp representing the 
composition timing appended to the compressed data 
of the artificial scene data. Therefore, the generation of 
a composite picture desired at the coding side can be 
ensured and the stream data that are transmitted con- 
tinuously on time axis can be supported. In addition, the 
coding/decoding synchronous reproduction of audio 
signals, video signals and artificial scene data when the 
coding is performed with clocks which are different 
among the audio signal, the video signal and the artifi- 
cial scene data can be implemented with supporting the 
interaction function at the decoding side. 
[0174] According to the third embodiment of the 
decoding apparatus of the present invention, the com- 
position circuit and the decoding apparatus for the com- 
pressed data of the video signal are driven by using 
clocks generated with a reference clock value which is 
appended to the compressed data of the video signal in 
the bit stream, and the composition processing is per- 
formed by using the time stamp representing the com- 
position timing appended to the compressed data of the 
video signal. Therefore, the generation of a composite 
picture desired at the coding side can be ensured and 
the stream data that are transmitted continuously on 
time axis can be supported. In addition, the cod- 
ing/decoding synchronous reproduction of audio sig- 
nals, video signals and artificial scene data when the 
coding is performed with clocks which are different 
among the audio signal, the video signal and the artifi- 
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ciai scene data can be implemented with supporting the 
interaction function at the decoding side. 
[0175] According to the coding/decoding system of 
the present invention, the coding/decoding system is 
constituted by proper combination of the coding appara- s 
tus of the present invention and the decoding apparatus 
of the present invention. Therefore, the generation of a 
composite picture desired at the coding side can be 
ensured and the stream data that are transmitted con- 
tinuously on time axis can be supported. In addition, the 10 
coding/decoding synchronous reproduction of audio 
signals, video signals and artificial scene data can be 
implemented with the operation/working-effect by the 
combination of the coding apparatus and the decoding 
apparatus and with supporting the interaction function 15 
at the decoding side. 

[0176] According to the bit stream of the present 
invention, the time stamp representing the decoding 
timing and the time stamp representing the composition 
timing can be made common to each other. Therefore, 20 
the generation of a composite picture desired at the 
coding side can be ensured and the stream data that 
are transmitted continuously on time axis can be sup- 
ported. In addition, the coding/decoding synchronous 
reproduction of audio signals, video signals and artificial 25 
scene data when the coding is performed with clocks 
which are different among the audio signal, the video 
signal and the artificial scene data can be implemented 
with supporting the interaction function at the decoding 
side and reducing overhead information. 30 
[0177] According to the tenth embodiment of the cod- 
ing apparatus of the present invention, the time stamp 
representing the decoding timing and the time stamp 
representing the composition timing are made common 
by using a flag to generate a bit stream. Therefore, the 35 
generation of a composite picture desired at the coding 
side can be ensured and the stream data that are trans- 
mitted continuously on time axis can be supported. In 
addition, the coding/decoding synchronous reproduc- 
tion of audio signals, video signals and artificial scene 40 
data when the coding is performed with clocks which 
are different among the audio signal, the video signal 
and the artificial scene data can be implemented with 
supporting the interaction function at the decoding side 
and reducing overhead information. 45 
[0178] According to the fourth embodiment of the 
decoding apparatus of the present invention, the decod- 
ing processing is performed by using the bit stream 
which is obtained by making common the time stamp 
representing the decoding timing and the time stamp so 
representing the composition timing with a flag. There- 
fore, the generation of a composite picture desired at 
the coding side can be ensured and the stream data 
that are transmitted continuously on time axis can be 
supported. In addition, the coding/decoding synchro- 55 
nous reproduction of audio signals, video signals and 
artificial scene data when the coding is performed with 
clocks which are different among the audio signal, the 



video signal and the artificial scene data can be imple- 
mented with supporting the interaction function at the 
decoding side and reducing overhead information. 
[0179] According to the another embodiment of the 
coding/decoding system of the present invention, it 
uses the coding apparatus and the decoding apparatus 
using the bit stream which is obtained by making com- 
mon the time stamp representing the decoding timing 
and the time stamp representing the composition timing 
with a flag. Therefore, the generation of a composite 
picture desired at the coding side can be ensured and 
the stream data that are transmitted continuously on 
time axis can be supported. In addition, the cod- 
ing/decoding synchronous reproduction of audio sig- 
nals, video signals and artificial scene data when the 
coding is performed with clocks which are different 
among the audio signal, the video signal and the artifi- 
cial scene data can be implemented with supporting the 
interaction function at the decoding side and reducing 
overhead information. 

Claims 

1 . A coding apparatus comprising : 

audio signal coding means for coding an audio 
signal: 

video signal coding means for coding a video 
signal; 

interface means for accepting information on a 
composite scene; 

scene data coding means for coding scene 
data supplied from said interface means; 
composition means for composing a scene 
from the audio signal supplied from said audio 
signal coding means, the video signal supplied 
from said video signal coding means and the 
composite scene data supplied from said 
scene data coding means; 
display means for reproducing/displaying the 
composite picture signal and the audio signal 
supplied from said composition means; 
clock supply means for supplying clocks to said 
audio signal coding means, said video signal 
coding means, said scene data coding means 
and said composition means; and 
multiplexing means for creating a bit stream on 
the basis of the time information and com- 
pressed audio data supplied from said audio 
signal coding means, the time information and 
compressed video data supplied from said 
video signal coding means, the time informa- 
tion and compressed scene data supplied from 
said scene data coding means, the time infor- 
mation supplied from said composition means 
and the clock value supplied from said clock 
supplying means. 



21 



BNSDOCID: <EP 0924934A1 J. > 



41 



EP 0 924 934 A1 



42 



2. The coding apparatus as claimed in claim 1 , further 
comprising means for detecting the status of said 
composition means and controlling the operation of 
said video signal coding means. 

5 

3. The apparatus as claimed in claim 1 or 2, further 
comprising means for detecting the status of said 
audio signal coding means, the status of said video 
signal coding means and the status of said scene 
data coding means, and controlling the operation of w 
said composition means. 

4. The apparatus as claimed in. claim 1 , 2 or 3, 
wherein said clock supply means includes first 
clock supply means for supplying clocks to said is 
audio signal coding means, second clock supply 
means for supplying clocks to said video signal cod- 
ing means and third clock supply means for supply- 
ing clocks to said scene data coding means and 
composition means, and said multiplexing means 20 
multiplexes the clock values supplied from said first, 
second, and third clock supply means respectively. 

5. The apparatus as claimed in claim 1, 2 or 3, 
wherein said clock supply means includes first 25 
clock supply means for supplying clocks to said 
audio signal coding means, second clofck supply 
means for supplying clocks to said video signal cod- 
ing means and composition means, and third clock 
supply means for supplying clocks to said scene 30 
data coding means, and said multiplexing means 
multiplexes the clock values supplied from said first, 
second, and third clock supply means respectively 

6. A decoding apparatus comprising: 35 

means for separating both of compressed data 
and time information of an audio signal, both of 
compressed data and time information of a 
video signal, both of compressed data and time 40 
information of scene data, time information of 
scene composition and clock information from 
a bit stream; 

means for decoding the audio signal on the 
basis of the compressed data and time infor- 45 
mation of said audio signal; 
means for decoding the video signal on the 
basis of the compressed data and time infor- 
mation of the video signal; 

means for decoding the scene data on the 50 
basis of the compressed data and time infor- 
mation of the scene data; 
means for composing a scene on the basis of 
the time information for the scene composition 
supplied from said separation means, the 55 
audio signal supplied from said decoding 
means for the audio signal, the video signal 
supplied from said decoding means for the 



video signal and the scene data supplied from 
said decoding means for the scene data; 
means for generating clocks according to the 
clock value supplied from said separating 
means and supplying the clocks to said decod- 
ing means for the audio signal, said decoding 
means for the video signal, said decoding 
means for the scene data and said composition 
means; 

means for reproducing/displaying the compos- 
ite picture signal and the audio signal supplied 
from said composition means; and 
interface means for accepting an interaction 
from a viewer to the composite picture. 

7. The decoding apparatus as claimed in claim 6, 
wherein said separation means separates a plural- 
ity of independent clock values from said bit stream, 
and the independent clock values are input to 
means for supplying the clocks to said decoding 
means for the audio signal, means for supplying the 
clocks to said decoding means for the video signal, 
and means for supplying the clocks to said decod- 
ing means for the scene data and said composition 
means. 

8. The decoding apparatus as claimed in claim 6. 
wherein said separation means separates a plural- 
ity of independent clock values from said bit stream, 
and the independent clock values are input to 
means for supplying the clocks to said decoding 
means for the audio signal, means for supplying the 
clocks to said decoding means for the video signal 
and said composition means, and means for sup- 
plying the clocks to said decoding means for the 
scene data. 

9. A coding/decoding system comprising said coding 
apparatus as claimed in any one of claims 1 to 5 
and said decoding apparatus as claimed in claim 6, 
7 or 8. 

10. A multiplexed bit stream comprising an audio sig- 
nal, a video signal and scene data, characterized in 
that a flag representing whether time information 
representing a decoding timing doubles as time 
information representing a composition timing is 
added to said time information. 

11. The coding apparatus as claimed in any one of 
claims 1 to 5, wherein said coding apparatus gener- 
ates said bit stream as claimed in claim 9. 

1 2. The decoding apparatus as claimed in claim 6, 7 or 
8, wherein said decoding apparatus decodes said 
bit stream as claimed in claim 9. 

13. A coding/decoding system comprising said coding 
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apparatus as claimed in claim 1 1 and said decoding 
apparatus as claimed in claim 12. 
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